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Preface 


The Hybrid Automated Reliability Predictor (HARP) integrated Reliability (HiRel) tool 

r ibiid r r F e, ; abil f /a r ,a , bi,ity prediction stcms a p- ^ of ^ LS 

h 980 at thT " ?T geU T at ;° n . (CARE in ) corn Puter program that was conducted 

Dr Rkhor Tr r Tn u u ,ang • 1StltUte in N ° rth Carolilla - A participating reviewer. 
• or Trivedi of Duke University, did a mathematical analysis of CARE III and suggested 

c 3gef NASA deci mT T^" 8 CapabiHty - ***** of the new features and mathematical 

Ranges, NASA decided to create a new capability called HARP. I was then the NASA CARE III 

sascsissTT- 1 ”-*”-** 

HARPwh ,T ber ° f d0Ct ^ “"tained that » tocotp ^Z 

RP, which became a joint Duke-Langley development project. Langley’s contributions to the 
sign included two sequence dependency gates, the redesign and implementation of the textual 
prompting interface, the integration of all HARP programs with uniform prompts, and other 
ecommendat ions such as the incorporation of a stiff ordinary differential solver and the state 
truncation technique. The first working HARP program was sent to requesting beta telt sitel 

an ANST 9 !f " , hen / BM f Cor f ,orat j ion announced support for the Graphical Kernel System (GKS), 
the deve t Potability, Langley encouraged the Duke team to investigate 

OrientlflcoTor a HARP. The result of this work is the Graphics 

from Old n° P 8 n™' P r °f am was completed at Langley with the help of students 

IT 2 i TT” V n T ,ty ( ° DU) WOFking in Langle y’ s Voluntary Services Program and 
Ttotvne CO K ° PP6n ^ Pamela J ’ Hale ^ The students tested the 

(PC^Kopneui took CO ,T for an IBM-compatible personal computer 

( C). Koppen took GO from an alpha to a beta program and implemented many new features 

H l , Because GKS is a versa, 1 ■ unpl™ o " ed 

standard, Hale, reimplementcd the PC GO program on the Sun Microsystems, Inc,, and Digital 
Equipment Corporation VAX workstations. S 

VohinTv a s!-r Arthl p an<1 DeAn " E - Jlinchter ’ two ODU students working under the Langley’s 
fHARPOi r ;;T Un , an< my directlon - implemented the code for the HARP Output 

irr d RP °il Pr ° gram ' Although 1 P rovided the initial design, Arthur made many refinements^ 
produce the prototype program. Darrell Sproles, from Computer Sciences Corporation Tde 

..TTToT t0 ° deS,gn and reini Plemented the code. We jointly refined the design to 

lilRUHARP CO TTptT c the COmpleti ° n ° f HARPO ’ thc main components for 
HiRel (HARP, GO, and HARPO) were finished and beta testing commenced in May 1991. 

Ko pp en a nd I also served as the engineering interface to over 100 beta test site users who 
carried on the beta test concept that I established at Langley for testing CARE III Langley 
also served as an alpha and beta test site. All code was first extensively tested at Duke then 
again at Langley before being distributed to the user community. The beta test program was a 
resounding success. The long-term Langley interaction with HARP users (8 vr) and HiRel (2 yr) 
brought m, nuportaut element of practicality to HiRcPs usage and devdopmeut. Many clL,£ 
to H.Rel resulted from beta site recommendations. These changes included the discovery of 
bugs, suggestions for improving the interface, and design modifications. In this regard, I wish to 

of theTuft 0 ^ Tl kk Sha ™ a at . Bodng Commercial Airplane Group. Sharma saw the power 
ot the fault tree sequence dependency gates and encouraged the HARP team to pursue this 
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work. Sharma, who has become a trusted friend over our long association, extensively tested 
HARP and “gets the prize” for finding the most bugs for any beta site. 

One important point remains to be said regarding the beta testing of reliability/availability 
programs Several philosophies exist for the justification of a particular scheme of testing. Based 
on our experience with CARE III and HiRel, the method we used was extremely effective. W 
used a wide distribution of users involved in a large diversity of ro “ 

satellites to submarines. We imposed few restrictions on our choice of HiRel users (other than 
they be U S users) and their applications. All distribution was made to unsolicited requesters. 
Because of thtwL expositio^ HiRel has become a very flexible and useful capabihty m 
many U.S. industries. We also serendipitously found an effective mechanism to transfe 
developed technology throughout the U.S. 

Our experience with CARE III taught us that a useful program eventually becomes modified 
to suit the specific needs of the user. We had anticipated and fostered this need by distributing 
soimre code Two additional components of HiRel have emerged as a result, phased-mission 
HARP and Monte Carlo HARR We also learned that useful code gets atoorb«f 
company and university computer programs and eventually looses its initial identity. The 
excellent examples of NASA technology transfer. 

To my colleagues and friends at Duke and Clemson Universities, I wish to say that the most 
rewarding episode of my professional career has been my association with you. Not only 
your efforts produce a product worthy of the 21st century, but you taught me the true meaning 
of being a dedicated researcher and the friendship and loyalty it brings. It is a glowing tnbut 
to your schools and this country to have such high-quality professionals. 


Salvatore J. Bavuso 
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Chapter 1 

Introduction 


1-1. HiRel Tool System 

^ rir ,y r M> ^ 

reliability/availahility programs that « l !j , ' fZ “ *“ * ml .'' en "" 1 ' 

workstation or nomvorksl alion environment. HiRel consists of'bi'lmr | ivr'eT ' 'l . i,1>r> . lK ' l " < / ' l ‘ “ 

^ 

Krr“s, zsstr a -" t-* 
rs* 1 in ""' li "' n '" ,r " "• "a™ 

i— :-r z:;: K i; r 

generates. In this sense HiFiel nffV.rc * n r- M1 nus that each program 

Mont.e C arlo integrated HARP (MCI H ARP 1 ! p. r o\ m , >f . * “* ° lls ' * )■ 

/ rnf n 1 V ir- . 1 1 1- II A JO ) (ref. 4), Phased Mission HARP (PM-II ARP) 

(Hi- 4), and X Window system HARP fXHARPl (n> f \ • . , * J '* • 

^t: 7 ' 

S ,n arp r ; rRPo 1 :"^ <»** <c i o) l |>roRr ' im ( " < ™ ii -" •» «* .* 

.. , I 1 (1IAH1 ()) piograin (described m vol 1 of |lij s TP1 Tlw< i... , 

components of 1 1 i Red (GO. IIAliP Mf’i.tT add . i iiadhoi UI '• 1 ,1( Imm 

X tl A R C** 'i ^kvniVoMt-^Trt m ^toir 

1.2. HARP Suite and Its Applicability 

S XsK-:s::s;:Er;ss 

lllUt<< t0 t h< tlad l tK)Iial combinatorial modeling approach. The addition of four special 

nil "dlin'Sd r,Zr ,k:# “ ! Wit “ <,iUh ,HhW in “ ASCn ,ik ' R "«»« ■ « accessary capability fa. „ mipwU . r . 

I COSMIC ’ T,m Univ(Tsi,v of ««>••»«, 382 East Broad St.. Athens. GA 3061)9 
4 tr 1 R ° ( ' lng Coinnu ' rdl11 Airplane Group. Seattle, VVA 98121 (Tilak Shan,. a). 

(•mson Lnivorsity, Dept, of Computer Science, Olemson. SC 29731 (Robert Gcist). 


Figure 1. Hiltel: CO. IIARPO. nix 


1 HARP suite of reliability engines. 


ISsillig 

,noM„ S situations where thousands „f Markov states .....at be oomooratoa. 
r , f-mit tolerant systems that use redundancy and subsystem reconfiguration to achieve 

:::r 

computation is conservative (refs. 8 and 9). It is conservat ve m that it prerl.cts a rcl.abil.ty 
is equal to or less than the reliability predicted by the full model. 


The efcof th” h 1 '" t0 th<! m ° del hy dr ° ppi " e im P ro bable failure or recoverv events 

~ - — - ■« 

»irr^ 

t™ pi i r ,, r‘- **? ,noMmg subt, «™ « —cum wi t hle:.s^ 

™ r; sx “ h :,'r„ t,,c 

-cy andllat^ 2 ' 3 : - » «‘ -'me 

(critrea'-pair ntultifanU tnorfel, to farther 

user burden of acquiring multifault data that is generally unavailable. 

This automatically generated near-coincident multifault model (critical noir'i ,i 1 .1 

r ;r g r szis r t m * — ° f - 

occurs in •“! 

::: vz « m ~ ery r is " oi criticaiiv <» z “ : !!::::: 

exhaustio,‘,;“Sr '" "’ »‘™"'lcd «* >■■ U- redundancy 

An example of critically coupled units is two units in a votinv tri-iH tbo ■ r 
c^aticm reqnired for survival of the system as in a flight coloKsystem m T^rl 
The probability of a near-coincident fault is significant for hiehlv reli il.l, t 
with system failure probabilities of less than (Hr 8 ) for the mission time of' interop " The 

units ‘ , S> USing coln Puters can have up to four active rcconfigurable proeessin K 

* her<> a ma J° rit y vote can be effected until two coexisting faults occur. ^ 

systenTfadure ^ five processing uints can survive two coexisting faults, but a third fault causes 
ystem tailurc This system requires a critical-triple fault model to effectively predict the near 

“rL p :^ i] s- Be H c r : hc H t Rp deveiop - ^ ^ 

to critical-triple or higher order S k Tt conservative approximation 

01 mgner order models. The degree of conservat veness depends 011 the sv^te,,, 
architecture and can be unacceptably high for some systems. V 

, T K h f.. rati “ nale t0 SUpport thc critical-pair modeling decision was based on the belief that the 
reliability of computers would continue to increase, making it less probable to Zccr tic^'ri „£ 

faultx. Consequently, predictio ns of ultrahigh reliabilities would he achieved with four or fewer 

Trr' rr * ta - ,m,aum ■■ -*•- «- — 

modules. ’ ' fault-tolerant with redundant hardware units and possibly redundant software 
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processors. This trend is in fact occurring which justifies the HARP developer s decision to only 
exactly model critical-pair faults (refs. 10 to 13). 

For those applications requiring higher order fault models, the HARP developers suggest 
that the user modify the < HARP > ASCII files that specify the multifault model exactly. 
(See section 2.7.4.) The modification can be accomplished with a common text editor before or 
after HARP has generated the appropriate files. XHARP is another alternative that provides 
automatic higher order multifault model generation. The < HARP > multifault approximation 
model is further discussed in section 2.7, and an example is given in chapter 7. The user is 
cautioned to study these multifault models carefully before application. An incorrect selection 
of the multifault model options can produce a nonconservative result because a particular option 
can drop important failure modes from the reliability computation. However, < H R > 
cannot warn the user of this modeling specification eiror. 


1.2.1. XHARP 


More recently, the user has another modeling alternative. An extended behavioral decomposi- 
tion model has been developed by researchers at Clemson University (ref. 5) and is implemented 
in XHARP. XHARP was designed to expand the modeling capability of the original H 
behavioral decomposition technique to include exact multifault modeling, multiple entry/exit 
fault model transitions, and automatic behavioral decomposition modeling. This capability is 
demonstrated in chapter 7. XHARP calls HARP, MCI-HARP, and PM-HARP as executable 
software programs; thus, the entire power of both XHARP and the < HARP > programs are 
available to the user. 


XHARP provides an X Window system environment for graphically specifying a semi-Markov 
chain that is automatically translated into the HARP structure for the fault-occurrence /repair 
model (FORM) and the fault/error handling model (FEHM). 


1.2.2. PM-HARP 

Phased-mission HARP was developed to facilitate the analysis of phased missions (refs. 4 
and 14). A mission is phased when the structure of the system (configuration) or component 
failure distributions change after each epoch (phase) in the mission (refs. 15 and 16). Multip e 
phases of fixed and random durations are allowed. Also, the system can be specified to be 
imperfect at the beginning of a mission. The GO and HARPO programs are compatible with 
PM-HARP; however, the phased-mission specifications may not be specifiable to GO directly. 
HARPO may not graph all the phased-mission output data; however, the output listings are 

complete. 


1.2.3. MCI-HARP 

MCI-HARP is comprised of HARP with a Monte Carlo simulation engine and is fully 
integrated with HARP. MCI-HARP can solve all types of models that HARP can when the input 
is specified as a dynamic fault tree (the extended fault tree with sequence dependency gates that 
HARP accepts). At present, this capability excludes cyclic Markov models that can be speci e 
to HARP in the Markov chain format that HARP accepts. However, MCI-HARP can solve 
certain model types that HARP cannot, such as non-Markovian models that arise when warm 
or cold Weibull spares are added to a Weibull fault/occurrence model. An important feature of 
MCI-HARP is the use of a variance reduction technique called importance sampling (ref. 17). 
Importance sampling makes it feasible to solve large models that contain widely separated time 
constants. Such models are called stiff and are common to highly reliable fault-tolerant systems. 
Although importance sampling is not a new technology, it has become more useful with the 
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Chai ", m " ddS {refs ' *• >«. 19). Another important 

i dime is A LI HAKi s ability to solve very large models with or without model tnii ration 
18 " >l> “' ,ll " y “ •— "* MO-HARP docs not store the entire Markovian state s-paco.' 

1.2.4. Textual HARP 

Three versions of textual HARP (PC-DOS HARP 16-bit version PC DOS HARP ■*-> I , 
version and PC-OS/9 t , c iUU ’ riAKi .52-bit 

HARP Vveenf r> ■ y l availab ^ for operation on a personal computer. Textual 

32-bit version 0 " ‘ ’ m " IK ° LC workstatlwns witl > the same limits as the PC-DOS HARP 


1. 2.4.1. PC-DOS HARP 16-Bit Version 


We developed and tested the PC-DOS HARP 16-bit version on an IBM PC \T with 5 p K of 

< m0ry and haw slK:( ‘essfully executed it on PC 286, 886, and 486 class machines Because of 
memory constraints imposed by MS DOS's 640K memory limit PC-DOS HARP 16 bit ,! i ? 
cannot mode, large models. The limits on the various paramo c, c L ^ PC D VhARP 

are given n, able I. Kvpanded rf*. „ p.^ibl,. wilh p Cs J ”^ ^5 ^ 

memory by ebangmg tho limit sizes of the HARP package. (See section 6.8.) 


Table t. HARP Pa 


iramrters 


Paramote: 


Limit 


Textual HARP 


Max. no. of states in Markov chain 
(may be larger if truncation is used) 

Max. no. of transitions in Markov chain 

Max. no. of symbols in model 

Max. no. of factors in model 

Max. no. of terms in model 

Line length in input file 

Max. no. of characters in a. parameter name 

Max. length of rates and state names 

Max. no. of nodes in fault tree 

Max. no. of component types in fault tree 

Max. no. of basic events in fault tree 

Coverage value precision 

Max, no. of incoming arcs per fault tree gate 


32- bit version 


Sorted: 10 000 
Unsorted: 500 

Sorted: 90 000 
Unsorted: 2050 

15 000 
15 000 
15 000 
80 char 
32 char 
12 char 
255 
95 
95 

No. depends on FKHM 
70 


15- bit version 


Sorted: 500 
Unsorted: 500 
Sorted: 1500 
Unsort eel: 2050 
500 
1500 
750 

80 char 
32 char 
12 char 
255 
15 

15 

No. depends on FEHM 

16 


Not.. Chat PC-DOS HARP lfM.it vvrsio,, the „ „ f „,„ k , s , 

..ftnlti, LTaRP H **•“ **»*<- <' f »K- foall.r., 

haLrm „,f l>C doSHARpTTI "T"""" «*«** “ llow,xl fit** «■'«! 

?[ , ARP ] °' blt verslon only 500, the user is advised to use a trunc ation 

level that restricts the model state size' to less than 500. 

In addition to being unable to solve large models, PC-DOS HARP 16-bit, version has some 

:"c; is sSe r ion r f : he simpi “ * «* ~ a* 

not allowed. State-dependent coverage factors cannot be used; hence no near-coincident 
an t calculations are performed. This restriction occurs because the failure probabilities due to 

incident faU ta are com I™rable with the precision allowed bv the PC 16 -bit version 
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. . . c . xt arp has been developed with Graphics Software System s 

and the HARP Output (HARPO) Graphics Display Users Guide (vol. 4 ol this IP). 

1.2.4. 2. PC-DOS HARP 32-Bit Version 

DOS version requires a 386 or higher class machine. 

1.2. 4. 3. PC-OS/2 HARP 

i ttmiy or OS/2 executing on a PC remove the 640K memory 
Other operating systems such as UNIX or OS/2 executing o ported 

restriction and hence the restriction on model size. The ful HARP 

to a PC under OS/2 and behaves Mentally .to ^ totallv interchangeable. 

capabHit, under is .hat a DOS- 

results executing , i , rm/9 PKS Also small models can be worked 

compatible GKS program need not be upgraded to OS/2 GKS. Also, smau mo 

entirely on a 286 PC. or just the graphics can be dt splayed on the PC uhen an OS/2, umA, 
VAX computer is necessary for large models. 

1.3. HARP /S Key Features and Overview 

The HARP /S key features are summarized as follows: 

. Very large system modeling (using MC1-HARP or behavioral decomposition and bounds 
with truncation with HARP) 

• Flexible method of modeling dynamic behavior (homogeneous/nonhomogeneous Markov 
chains) 

. Automatic Markov chain generation from a fault tree description (particularly useful for 
large systems) or direct user input of the Markov chain 

• User choice of seven fault/error handling models ranging in complexity from a simple lab- 
oX parameter estimation model to a complex Petri net model for detailed fault/error 

handling analysis 

• Automatic insertion of fault/error handling models into Markov chains 

• Automatic parametric analysis 

• Phased-mission analysis 

. Non-Markovian models with Weibull cold and warm spares 

. Written in ANSI standard FORTRAN and successfully ported to many different host 
computers including IBM-compatible 286, 386, and 486 PC’s (including AT&T 6300 
with 640K), DEC VAX, Sun, CRAY Y-MP, Affiant, Convex, Encore, Gould, Pyramid, 

and Apollo 

• Runs under MS/PC-DOS and Microsoft Windows NT, OS/2, DEC VMS and Ultrix, 
Berkeley UNIX 4.3, and AT&T UNIX 5.2 
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• Interactive graphical input/output workstation capability for DEC VAX under VMS. Sun. 
and IBM-compatible PC 

• X Window system graphical model generation 

• Extensively and independently tested and applied to practical systems (over 8 years of 
industry beta testing and over 100 copies distributed) 

• Independently tested and evaluated within NASA 

HARP provides the user with a language to input a model and solves it for the system 
reliability /availability for user-specified mission times. It uses behavioral decomposition to avoid 
the problems of model largeness and model stiffness (refs. 6 and 7). 

The reliability model is decomposed along temporal lines into a FORM and a FEHM. The 
FORM contains information about the structure of the hardware redundancy, about the fault 
arrival processes, and about manual (off-line) repair. The user specifies the FORM either as a 
fault tree or a Markov chain. The FEHM (often called the coverage model) allows for permanent, 
intermittent, and transient faults (ref. 20), and models the (on-line) recovery procedure necessary 
for each type. The FORM/FEHM models are merged according to a user-specified multifault 
model. The resulting system reliability /availability model is a simplification of the originally 
specified model. The correct specification of the multifault model is crucial for HARP to produce 
a conservative result. 

HARP also accepts as input a nominal value and a variation on all FORM input parameters. 
The nominal value is used for the reliability prediction, and the variation about the nominal value 
is used m an approximate (simpler) model to generate bounds about, the predicted reliability. 
Additionally, HARP supports the modeling of time-dependent failure rates by allowing a 
symbolic failure rate to be associated with a Weibull failure distribution. We caution the user 
that the use of Weibull distributions leads to a long solution time because the symbols must 
be reevaluated at each time step, but it can also lead to a more accurate model of the system 
under study (ref. 21). MCI-HARP’s simulation has been shown to be more efficient in solving 
Weibull models than a numerical integrator (ref. 3). A new feature in HARP is the use of state 
truncation to further avoid the problem of large models. 

Input data to reliability models can be inaccurate by as much as hundreds to thousands of a 
percent (ref. 22). Because of these large errors and the recognition that reliability modeling is 
more often an art than a science, the user of HARP must view the results with a healthy dose 
of caution. W hen trade-off studies are performed with comparable input data, the computed 
results are meaningful relative to the models being compared. If the user is interested in arriving 
at absolute reliability predictions, then much caution should be used in the interpretation of the 
computed results. HARP outputs eight digits plus an exponent. The eight digits are displayed for 
the purposes of user calibration, that is, to determine whether the users computer is computing 
the results that the developers intended. The eight digits do not imply reliability prediction 
precision to eight digits. 

Experience has shown that next to tedious hand calculations, the most practical method 
of developing confidence in the results computed by any reliability predictor is to compare 
the computed results of one program with those of a different reliability program. This 
recommendation is based on the authors’ interaction with beta test site users over 8 years 
where on several occasions, coding bugs were discovered as a result of the user’s comparison of 
HARP results to an in-house company reliability program including some obtained from NASA 
(such as CARE III) 1 ’ and other institutions. The HARP developers also used this technique 

Computei Aided Reliability Estimation third generation computer program. 
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extensively (refs. 23 to 26). ^ 

Textual HARP executes on DEC VAX work*ati.^u,„te VMS 
UNIX, and IBM-compatible 286 386. and 486 PC ^ under MS DU ^ ^ standar<1 

PC’s, the full HARP capability c “ 1 °* b bFell com pilcd with Lahey and Microsoft Fortran 
Fortran 77 compiler, and textual HA computing platforms because it was written m 

on the PC. It is compatible with a wu e ra g creates ASCII files that are compatible 

ANSI standard Fortran 77 for the PC environment can be 

with most computing platforms. P ' ^ , )(> used . us a workstation for input and 

executed by a VAX workstation, n ■ »s w , < system number computations. 

=,7a^ 

rS “Z- aSTK hf Z Irl ,,u,red by other program. 


CHANGE MODEL 



PARAMETERS 


Figure 2. 


HARP execution flow and relationship to GO and HARPO. 


The programs also accept files created iTie fopSto to' ('.an also come from files 
interactive input capability or simply input ^ ^ structured files. These 

generated by the GO P^J^^h a"e user to graphically display the HARP 
files can be used as input to HAR , interactive mode. Thus, as an overview, textual 

tabular data in a wide variety of forn program is a graphical input to textual 

HARP in by analogy the centra * iZnZ Z-pting capability, and HARPO in 

harp ' s files - Sopar ‘“ uscrs 
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S2££ G ° a " Cl HARP ° (V ° l8 3 “ d 4 ° f ,h ‘ S TP) a " d a ‘“ toial <™' ^ of ‘his TP) are also 
Table 1 gives the limit on various parameters definpd ; n warp n • 

: t ttr s 

1.4. HARP Version 6.0 

fea "r, Fm,r riovei <*>-•“* 

as sequence dependencies. which cannot V . T fcal " res of * s .V' a tcn) frequently characterized 
Hence, they greatly enhance the modeling capahihtSs ““ R S p " dard k /"’ »" d " r *««• 

equation solver, is tahing too manyteps to 

altered “l “T * ^ ^ ™n tune has been 

more r,d,nst (L ^ion 4 5) „wev , Tl “' — makes 1, much 

MODELNAME IMP fil c a small 1 i ! arpeng stl11 roads 1,1 the old format of the 
- that the last two questit.Hf e be^ t !7 in thc ?“**»« a ^’d by har W n„ 

clearly demonstrates these changes. 0001 MnU ° 11C ' A Sllnpl ° run through harpeng 

,n £3£ At T 

and error messages have been altered to clarify their meaning. A Wanim S 

1.5. HARP Version 6.1 

enhfrd R g P gt''(se, 6 ' it ‘’""l""* '." 0dilicalio " s ' Tl, e -i.vnamic s«,uene,s 

firs, mast he Tth T f" “* ®“° «“* 

distributions’) Tb,. fir.f • * ^ . ' pl tcd ’ Inultl ple basic events with identical failure 

MODEL^ERS^Xttfsf 5 , ^ ~ ^n renatned 

e«h mn of the sate ^ ^ ^ 
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i th ates of the target state were utilized. (See section 2.7 for details.) 
defaults are fully implemented, more extensive warning messages have 

been added to the code, and some coding bugs have been fixe . 

1.6. HARP Version 7.0 

HARP version 7 0 thl nnmUl rf ttSsit‘kL in thl Markov 

^ Tf Sm^ 

Markov chain states exceeds 90000. ^ e increase the value of PLEN or 

useful if tdnve runs out of memory in { ] . J < 3 j When tdrive is invoked and it 

rerun tdrive with a larger truncation value). (See sec on £0^ ^ choices . 0ne 

senses the presence of existing es, nt.'e quene nodes to an existing tree 

choice is AT. AT allows the user to append If the last node in the 

in MODELNAME.TXT but only if the last existmg^ ^ ‘ shou i d ^ tQ 

MODELNAME.TXT hie AM*. This feature is useful 

delete the line containing the FBOX n P wrnnff after many nodes have been 

entered This feirtm^alw^pr^ud^th^need^ ttm If tl^user 

smtke rr^: r= rs A - 

bugs were hxed and are delineated in appendix A. 

1.7. HARP Quick Reference 


1.7.1. Summary— HARP Capabilities and Limitations 

HARP is intended for reliability 

recovery management techniques, particu ar y lose us ^ determined during 

sections provide a list of the capabilities and 0-*^ 5ame execution ot 
beta testing. Certain listed capabilities c specification of the same system would not 

HARP; thus, a fault U. ^* ^,^217.“^ specified^ HARP creates 
re^re^arL" d solves the Markov chain. The — is no, true. The limits 
on the size of the problem that HARP can solve depends on the system. 


1.7.2. HARP and MCI-HARP Capabilities 


• Dynamic fault trees with repeated nodes (i.e., shared basic events) 

. Repairable systems (to determine instantaneous availability), which are specified with a 
Markovian FORM 

• Systems with sequence-dependent failures as dynamic fault trees of Markov chains 
. Weibull failure distribution including hot spare repairable systems 

• Weibull failure distribution with cold Weibull spares (MCI-HARP) 

editing) 

• Provide detailed coverage modeling with a choice of FEHM s 
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' M “ kOV Chains for ™ »"d. “'if. 8 te 

‘ giVe ” " ‘ faUlt tre ° and *“*» ”«** with or 

• Systems with cold and warm spares (refs. 27 to 30) 

1.7.3. HARP Limitations 

• Mean time to failure (MTTF) or mean time between failures (MTBF) 

• Steady-state evaluation 

• Weibull failure distribution mixed with constant failure rates in repairable systems 

• Bounds analysis for systems with Weibull rates or with no absorbing states 

• Automatic generation of Markov chains for repairable systems 

• Phased missions; use PM-HARP (ref. 2 ) 

• Weibull failure rate for stiff systems (use MCI-HARP) 

• Weibull failure rate for models containing the cold spare gate and warm spares (see 
section 2.9.1); use MCI-HARP (refs. 18 and 19) 

• Slow recovery with behavioral decomposition 

• Model systems whose unreliability is less than (!«-») when FEHM models are include,! 
(unless the epsilon variable parameter EPX is changed, see section 3 . 3 . 3 ) 

1.7.4. About Volume 1 

Volume 1 of this Technical Paper is a user's guide for the textual HARP program which 

r ,hc - for ~ — - — 

dismm^H,, 2 d'u “T S ri ° US ste|>s ,M « W to completely specify a system in HARR It also 

HAW td eElZ?™ ^ aVail “ blC ' Ch “' )to 3 — ' *» solution techniipics user 

in HAm , and chapter 4 presents an overview of the HARP urogram *nrl 1 fu 

along with the user input. Chapter 5 provides practical iZS"^ HARP pmcraiii* 

Chapter 6 gives a mathematical description of the nonstandard fault tree d open o , " z Z 

in r — g Appendix A iT^C 

, f (V • TP1 '♦ , 2 ' d append,x B llsts warning and error messages. The tutorial fvol 2 

coneentT A nT i° "T tl,r ° Ugh S ° VCr “ 1 cx “ n, i )lcs “■< f^hcr explains many of the HARP 
concepts. Additional applications can be found in references 2C anil 29 to 32 
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Chapter 2 

Model Specification 

2.1. System Mathematical Model Overview 

The reliability model that HARP actually solves is always a Markov chain even though it 
can be input as a combinatorial or sequence-dependent fault tree. Depending on the user s 
choice the model can be a homogeneous (only exponential failure distributions, i.e constant 
Mure rates) Markov chain or a nonhomogeneous (at least one Weibull failure distribution, 
i e nonconstant failure rate) Markov chain. Although these general stochastic models cover 
a wide range of systems, some systems require even more general and computationally more 
difficult stochastic models. These systems arc the highly reliable fault-tolerant variety that use 
redundant subsystems for increasing system reliability. These systems often use computers for 
real-time-processing control and system management of failed redundant components. 

Because failure recovery requires either a random or possibly a deterministic t ime, a second 
component failure can occur while the first is being properly dealt ^ with. The : second -fault is 
called a near-coincident fault. The reliability/availability of highly reliable fault-tolerant s>yste . 
is sensitive to the near-coincident fault and is typically the dominant unreliability contributor. To 
capture the effects of this important parameter, a semi-Markov chain model is required to account 
for the system holding time during system recovery (recovery time) when fault occurrences are 
exponentially distributed. If fault occurrences are Weibull distributed, the stochastic model 
becomes a mixed-Markov chain, an even more complex mathematical mote . l(S( s oc *as 1 
models are computationally costly to solve in the traditional manner and thus severely limits 
the size of the model. Thus, HARP was designed to model these systems efficiently. 

2.2. Fault/Error Handling Mathematical Model 

A mathematical technique that significantly simplifies the solution of both these models is 
called behavioral decomposition. The technique makes use of the fact that fault occurrence 
times are typically on the order of thousands to tens of thousands of hours, while fault recovery 
times are on the o der of fractions of a second to seconds. This disparity of event tunes makes 
it possible to solve a FEHM in isolation with respect to the FORM. The solution of the FEHM 
model determines the internal (to the FEHM) race condition times of exit from the FEHM 
and are expressed as exit probabilities and holding times. Timing considerations arc carefully 
modeled within the FEHM. Once the exit probabilities and holding times are thus determined, 
the behavioral decomposition model assumes that the recovery outcome (FEHM exits happened 
in zero time. The exit probabilities behave as exit path switches with infinitely fast switching 
speed. These switch probabilities are often called coverage probabilities in the literature. e 
coverage probabilities are automatically incorporated into the Markov chain (i.e., homogeneous 
or nonhomogeneous model) and solved with a straightforward ordinary differential equation 

solver (GERK). 

HARP and XHARP offer two classes of FEHM’s: single-fault and multifault models 

The single-fault model capability ranges from simple to complex, while the multifault model 
capability is relatively simple. It uses a near-coincident model that causes system failure resulting 
from user-specified synergistic critical-pair faults. In contrast XHARP has a nea ^ 0 ' n ^' *? 
multifault model that is general and removes the critical-pair faults restriction of HARP. Ih 
more general model in XHARP is especially useful for systems where many fault containment 
regions are modeled and more than two near-coincident faults can be tolerated. 


12 


2.3. Implementation of System Mathematical Model 

to" <al1 "' ? j „ . 

fundamental mathematillZw Z,'? ■“"* l "“ 1H "ARP'» 

j'""l> model approximation is conservative (ref 81 IvT Ti Kllar ' ullw,s l,1 “ t tl “' instantaneous 

mast specify f "" Tl '" 

An nieorreetly specified model may not (non Knalnnt.ee a conservative computation, 

conservative Lull * “* » conservative result „r may prodpe,. „„ , m . r | y 

ly«.. the different models involved in the HARP 
explicit specification of Markov chains. " " ''“ Vl ‘ >ral ‘^'composition *"' 1 li 8" r '' 1 shows the 



Original HARP 
model specification 


Model generation: 
behavioral decomposition process 





igmv ,1 Relationship of modeling consider; 


if ions wit h behavioral deeomposili 


2.3.1. Behavioral Decomposition Model 

« tm iXl“ t ^i :'!r M wh^h": h^ 1 ! "o,! nr ' 1 V' r;".v *>*"* '"" ,w <•*» 

consideration. This process is no/ v,d wL, m , fl ' r .m,l«ted iron, Hie system , m d,. r 

of this document. The user translates the ° <K '‘H w> r consideration is beyond the scope 

A/1 by using the FORM/FFHM mu] he '‘"‘T' ,nodel M0 into HARP paradigm 

- entering „ tliZ^^rr 7 *"?^“'™ '* *> 

multifaceted and depends on the user’s inelin i i !” . i , ^° K '® <>f whld ‘ nota tion to use is 
near-coincident faults, and modeling complexity. " ^ familiarit >'’ thp nee<1 to model 
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Figure 1 . Relationship of modeling considerations 


wit h dim t 


Markov chain modeling. 


2.3.2. Markov Chain Model 

a reasonable dioiu Ah is ■ needed direct Markov chain entry becomes 

r 

spcviHrution, not solution. 

The spedlieation of a Markov vham i» tedious and error prone. 

2 . 4 . Important Modeling Considerations 

a.-Mcathm »f <-.*x Markov chains «, «M 

the conceptual model. 

M1 - » mode,. This, ; 

simpler model through the process .of WMl/rEm component per 

^^=S^=S5£SS 
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t P hat mor Ut the °' >jeC ' iV0 “ ‘° CStimate ,he t0tal Crr ° r fuld «PO« the desired result in light of 
nodel M(J, e.g., rationale, assumptions, and simplifications 

th, s “;™“ 

Examples of A/1 models are shown in chanter ‘i The r; . • j . 

multifault model is also a part of A /1 h„t t r S P Ration of the near-coincident 

j t i ^ ^ hut is not shown £r<iphiccillv Rocminp tv, n pphm 

resulting transformed model (A/2) is^tdw^^a Markovian'utodeh Mochif it^is automatlcafiv mid 
mechanically Produced I by HARP and is a mathematical approximation of mode" V Example^ 

Sh ,°";, in C " aPterS 3 a, ' d 5 T " e McGongh-TVivedf t WemTre 'J 

12 ' S rna thematically conservative with respect to A/1. By forcing the FEHM 

system failm^st^e^ccurtTuian^wotdd Irccur 'h/realitv ^Thi^ prop^t'^'^i traI1 ^' 011 to a 
of the McGough-Trivedi proof, guarantees that Markovian nmdel A/2 niod“ “ "T* 

55«=tWiKS 

- - 

process Howpvpr yuadd *. n ® 7 opecincation or A1 1 is a manual 

msnres that the ptoperZ, %£££?££ ^ 

number t^ttne^howevenTiniu^hrg^^^'no^significant^ailnrc^mo^tw' are ignored 1 hi^/riwjulre^ 

S r ". red tv' Z '■ ITT 1 . ^ ^ ^ ^ 

q V the user. Using a text editor to edit the ASCII files generated in HARP • 

SsS^SsStSSSS 

to A/l sZfiT 50 ' V “ tlV ' Specif y in e the wrong multifault model to HARP during the I/O 
to A/1 specification can produce a uonconse, native result because the incorrectly tpexified 
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multifault model can drop important failure modes from MO. The user must understand how MO 
is conceptually related to Ml. (See chapters 3 and 7.) 

Another issue of importance when considering the use of these models j s the 
conservatism or nonconservatism produced by the use of the multifault models. The trade- 
off that the user must grapple within any modeling exercise is one of modehnj . “mplexrty 
versus computational accuracy. Because the HARP developers intended HARP to be applied 
to practical systems, many mathematical techniques were used to reduce model complexity an 
stdl provide useful results. Behavioral decomposition and the HARP multifault models were 
selected to achieve this purpose. Another extremely useful model reduction technique is the 
Markov model truncation scheme described in section 2.8. This technique is especia y ^use s u 
solving extremely large Markov models that can easily result from a modest looking fault tr . 

The advantage of behavioral decomposition is that fault/error handling modeling no matter 
how complex the FEHM’s and no matter how many FEHM’s are included contributes at mos 
two additional Markov chain states. The savings in computation for typical systems of interest 
can he substantial and makes it entirely feasible to model intricate fau t/error handhng detai 
even when the FEHM itself is non-Markovian as is the case for the Extended Stochastic Petri 
Net (ESPN) FEHM model or when deterministic (constant) recovery times are specified in a 

number of HARP FEHM models. 

The disadvantage of this scheme is that the disparity of the FEHM and FORM event tunes 
affects the accuracy and hence the degree of conservatism of the HARP predictions. The ar er 
apar! the event times, the more accurate the results become. Also as the event times approach 
each other, the accuracy decreases but the deviation always accumulates on the conservative 
side For typical highly reliable systems, the time disparity is six or more orders of magnitude, 
virtually insuring a result much more accurate than the input data accuracy could ever jus y. 

As an example of a worst case for disparity, a two triad system with processor failure rates of 
l(r 4 /hr showed a conservative deviation of about 80 percent when the event times were on the 
sameorder of magnitude. Please note the significance of this model: The recovery tune » abou 
the same as the expected time to failure of one component. This system is hardly realistic, 
HARP still yields an acceptable result. As the time disparity increases, the accuracy increases^ 
One order of magnitude difference in time disparity produced a deviation of 6 percent an 
0 5 percent with two orders difference. On considering that failure rate values can be m error 
by hundreds to thousands of a percent (ref. 22), the relative deviation resulting from behavioral 
decomposition even for this pathological example is minuscule. 

The advantage of using HARP’s multifault models is that for the majority of practical systems 
(up to four critically coupled units), an effective model is automatically generated by HAR . 
For svstems with more than four critically coupled units HARP produces less ^rate but 
always conservative results. (See section 2.7 for selection of conservative fault models.) HAR • 
multifault models are also easy to specify and no further (usually unavailable) data are required. 

The user has two alternatives if the conservative deviation is unacceptable Manual editing 
of the HARP generated ASCII files allows the specification of detailed multifault models tor 
more accurate predictions. The XHARP program (see section 1.1) contains an automatic 
model generation capability that includes a detailed mult, fault model that produces more 
accurate coverage computations than HARP. The XHARP multifault model P lace ^°^ditional 
computational load over HARP but requires more input data from the user (ref. 5). Thetrade- 
off of using these two extended techniques relates to the model size. XHARP requires a Markov 
chain input specification, which for large models can be tedious for the user to input. On the 
other hand, HARP generates a large model from a fault tree that is relatively easy to specify, u 
some file editing is necessary to accurately model the more complex multifault model. When one 
needs the modeling power of the HARP FEHM’s, no other easier alternative presently exists. 
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Although ™ rh™r 0USl !'' beh r, i0ral deco,n P° siti ™ «!«. ho applied to mixcd-Markov models 

lough no theorem has yet been proven that guarantees a conservative result for this model 
S"*: rcs „U of a practical system has not yet hoc,, demons,,^ cmZ!t 

system to be n.odc^V HARR “ ““ ""T to “"">letelv specify a 


2.5. Fault-Occurrence/Repair Model 


The t-occn rrenee/repai r model contains information about the structure of the svstem 

(how many components of what type and interconnected in what way) and about the fa Lit ‘ir v 
and reparn process.es (how often does each component type and how iol""^ 

)• This information can be entered either as a Markov chain or as a fault tree (in the case of 

"unrepairable de P endi »S <>» whether a state-space based for small mod!" Ibnlt Uiim 

imposed by user willingness to input the model manually) or a fault tree representation of the 
system (for large models) is more appropriate. " 1 ,lUt,on ° f tll< 

a nftrt Md | k ° V entered 9-8 a state-transition rate diagram, in which each state represents 

a particular configuration of the system. Transitions between states represent units f uiimr or 
^eing repaired A fault tree is a model that graphically and logically represents the vutous 
combinations of events occurring in a system that can lead to system failure (ref TH ' n 
fundamental logic gates of fault trees allowed by HARP are the and gate, the or gat* the I Z 
gate, the mv gate, and the xor gate. A k/n gate is used when the occnrrence d 

of n possible events cause failure. The basic events in the fault tree represent failure of the 
components that form the system being modeled. 

When the FORM is entered by the user for either a fault tree or a Markov chain the 
component failure rates arc initially specified in symbolic form as symbolic failure rate names 
umerical values are requested later when harpeng is executed or in some cases by fiface This 
scheme allows for the efficient solution of the model for performing sensitivity or trule oS 
naRses when several sets of numerical data are examined. The specification of symbolic failure 
rate names should avoid the use of special characters as these can often interfere with the users 
operating system. In particular, do not use the symbols $ or &. 

harp n s r nvaiiahi ° for n , 

, , r ' V VCrtS a fault treo Into an equivalent Markov chain for solution (see 

chapter b), the add.,, on of the dependency gate., is a natural extension to ,h,‘ )nor, ,“nu!,o . 

(tol bl " 8 ‘Th gHtC 7' M “f a PP licati <J"* l'»ve demonstrated their modeling power 

occurrence of any dependent basic event has no direct effect on the trigger event A functional 
c ependency gate is useful when the occurrence of some event (say a node failure) causes some 
- C0IUp0nen S t0 >0 unusa hle ( c, 8-> sensors that can be connected to the node) For this case 

~ a H T i t0 httVR fail( ‘ d (but n ° model is invoked). The nondep^Zt 

output from the dependency gate reflects the status of the trigger event This outn„t iv , • i i 

to enhance the drawing of „rgc fauft trews, and it can he 
I ut to some other gate to simplify the drawing of the tree. 

th Jt»° Pri0ri ; V and g f e iS eSSentially an and Sate with two inputs with the added restriction 
the input events have to occur in order. If the two inputs are A and B (fig. 6), then the 
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Nondependent Output 

* 


Trigger 


Functional 

Dependency 

Gate 


TIT 

Dependent Basic Events 


Figure 5. Functional dependency gate . 


A and B occur 



Figure 6. Priority and gate. 



Figure 7. Cold spare gate. 


priority and gate fires if both the input events oceur and event A occurs before event B. The 
gate produces no output if event B occurs before event A. 

The (f ATT rllUnpu^ toZ'l 


7 A replicated basic event represents multiple failure events having 
notation significantly reduces HARP generated Markov models. 


identical failure distributions. Using this replication 
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units are precluded from failing until they become operational. When the primary input unit 
fails, it is replaced by the first designated alternate unit. The alternate functional units (or 
cod spare units) are not allowed to fail when they are dormant, 8 However, this rule has one 
exception. If a cold spare is functionally dependent, on another component (i.e., it is a dependent 
event of a functional dependency gate), the cold spare may actually be unavailable (because of 
the occurrence of the trigger event) when needed. Hence, a cold spare gate does not prevent one 
ot its spares from being caused to fail by a functional dependency gate. Note also that a spare 
component can be shared by two or more cold spare gates (i.e., pooled spares are possible). 

The sequence-enforcing gate* is similar to the cold spare gate but has some unique, important 
and subtle properties not present in the cold spare gate. The sequence-enforcing gate controls 
the ordering of events in a manner similar to that of the cold spare gate. That is, the input 
events are constrained to occur in the left-to-right order in which they appear under the gate 
(i.e., the leftmost event must occur before the event on its immediate right, which must occur 
before the event on its immediate right is allowed to occur, etc.). There can be anv number of 
inputs (sec fig. 8), the first of which can be a (possibly replicated) basic event or the output of 
some ot er gate. All inputs other than the first are limited to being (possibly replicated) basic 
events. The sequence-enforcing gate differs from the cold spare gate in the way they treat shared 
event^ A'though the specjfication of the gate is straight forward, its modeling implications are 
^ T, , effeCt ° f failures associated with this gate can be local (relative to the component) or 
global (relative to the entire Markov chain). In some cases, the sequence-enforcing gate can be 
used to describe state-dependent FEHM’s in a fault tree. (See section 4.7 for the concepts and 
chapter 6 for an example.) 


Aj + 1 is only allowed 
, to occur after A- 


SEQ 


Figure 8. Sequence enforcing gate. 

Note the restrictions on the inputs of the four dynamic gates previously described. All inputs 
to the cold spare gate must be (possibly replicated) basic events. In the functional dependency 
gate, the trigger input can either be a basic event or the output of some other gate, but the 
dependent events must be (possibly replicated) basic events. The priority and gate has no 
restrictions on the two inputs. They can be basic events or the output of some other gate. Thus 
two or more priority and gates can be cascaded for more than two sequence dependent inputs’ 
In the sequence-enforcing gate, all inputs except the first must be (possibly replicated) basic 
events. The first input can be a basic event or the output of some other gate. Like priority and 
gates, sequence-enforcing gates can also be cascaded. Both gates are cascaded from the left, 
INote that the gates cannot be cascaded from the right, (See fig. 9.) 

The inv and xor gates were also implemented. The inclusion of these gates into a fault 
tree produces a noncoherent model that can cause the inexperienced modeler to generate 

l 7 deling aSSUmption is oftcn useful to arrive a best case scenario to g.ve an upper bound on reliability. 

n earlier publications, this gate is also called a sequent* gate.. The word enforcing was added to emphasize its function. 
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Figure 9. Cascading priority and gate and sequence enforcing gate. 


unexpected results. For example, if the top gate in a fault tree is an ^ ^ 

reliability probabilities (numerics) as unreliability probabilities m the ou p • 

are corrects computed, but the inv gate alters the meaning of the reporting labels^ These wo 
additional gates give the user an extensive modeling capability; for example, researchers at Duke 
University proved that the set of HARP gates establishes a mapping into the entire noncyc 
(nonrepairable) homogeneous Markov chain state space. (See ret. 3b.) 

Using HARP as a combinatorial fault tree solver without FEHM’s is computationally 
inefficient, although convenient for the user accustomed to HARP. When a fault tree con itai 
sequence dependencies, HARP provides a unique solution technique that can be i c 
expensive to aleve others. By using the ™ 
f section 2 8) these applications become considerably more practical. Fault P 

Particularly useful for large fault occurrence models especially if fault/error handling is inducted. 

S converts 8 , fault tree representation into a Markov charn the user can always 
alter the generated Markov chain to include behavior not captured y e an _ 
n-tuple notation is provided as an option to aid the user in identifying the Markov chain states. 

Mission time is assumed by HARP to have the units of hours even though mart 
handling models use a time scale of seconds. The user must therefore express the FEHM time 
units as specified by HARP. Chapter 3 provides more detailed information on how to inpu 

two FORM types into HARP. 

2.6. Single Fault/Error Handling Model (FEHM) 

The general form of the single FEHM is shown in figure 10. The detailed fault recovery 
models capture in a few parameters the sequence of events that occur within the system once 
a fault occurs. (Sec figs. 11 to 13.) A fault can be permanent (dways present ^capable 
of nroducing errors, e.g., a broken connection), transient (present for only a short time, e.g., 
a ghtch in the power line), or intermittent (always present but not always active, e.g., a loose 

connection). 

All FEHM’s defined later in this section except the CARE 111 TEHM (section^ J)^ time 
units of seconds to emphasize that these events are fast events. The CARE III FEHM 
units of hours to be consistent with the program CARE III. 

The FEHM is a connected group of fast states that is replaced by a branch point automatically 
in HARP. Its general structure is a single-entry, (up to) four-exit model, that is entered when 
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Fault Occurs 



Near-Coincident. Single-Point 

Fault Failure Failure 


Figure 10. Fault/error handling model. 


FEHM m" W K ? a T COn< dcpendent fault, ° occ,,rs before another exit is reached The 
FEHM models are described as three exit models by the user (the R C and S ex it s ' T 

"ear-coincident failure exit is automatically added by the HARP program. ’ “ ' " 

Many choices are available for the specification of thn ppuh i i 
exit probabilities (VALUES FEHM) to a detailed ESPN (rtf ie ' fehm'"'''''' 

;^r^ec^r ing a different fe ™ s; 

(See section 47.) ^ ° VerTidin * FEHM <>P«on. 
of tlie four exits are de w U T,, ln Wolatlon > and thp Probabilities for each 

atasSaSSSgs 

•be ** mw - -> - - * «- * 
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2.6.1. No Coverage Model 

The NONE option specifies no coverage model. This option is chosen if the user wants to 
assume perfect fault coverage for a particular component type. 


2.6.2. Values Model 

For a particular component type the 

”d^«°=coi„c,dent faults are considered for those transitions having values 
the same as the coverage model type. 

2.6.3. Probabilities and Moments Model 

that the exit is reached). The probabihty ol re s ^ of d ndent near-coincident 

o“t1on is used, the FEHM can he visualized to he a single 
“tMirtaSi (fast) state that is reduced to a branch point by HARP aggregate methods. 

2.6.4. Probabilities and Distributions Model 

' Under this option, the «. * 

permanent coverage, and angl^omt ^ following; consta nt, uniform, exponential, 

distribution of time to exi is spec Weibull These probabilities and distributions 

hypoexponential, hyperexponential, gamma, and Weibull. L h«*e ^proDaD 

automatically derived from this da . , , branch DO y nt ) )V HARP aggregation 

as a single semi-Markovian (fast) state that is reduced to a branch point by HAn gg 

methods. 

2 6 5. Probabilities and Empirical Data Model 

r ZZ in seconds, and tas^ZC 

action methods to a branch point. 


2 . 6 . 6 . 


ARIES Transient Fault Recovery Model 


The ARIES transient fault recovery model (ref. 39) represents a multiphase recovery process 
tha^execute^Np 1 successive recovery phases. (See fig. 11.) TVansition to the next phase takes 


22 


t , J!"; — 

phase . If transient, recovery is unsuccessful after -ill v n r .i> *> ‘ ( 1 

process is initiated. ' phas<s ' ,,K>n ;l I>«nnHne.it recovery 



1 . A H IKS fault recovery model. 


p.nt"" ■rL";:i, ,,r It 

i.y /’/•;. 7 = />/; - /4 _ ,, ,!■ t , '\" f ." ,‘ T,IS 1 • " l,as< ' ' “ 

tf* 1 j i ■ 1 ne time unit for t ins model is seconds 

am tugulliur wit], tl„. transiont wll ,,i„ r 

r “ fault tliat < II, anti],, svst,:,,, ,, fj, inlL. ; 

iipialZ. r "™ V " r rr,, ‘" "• T,,< - t "" ,r “ l <“'■ 2 "'“ TP) provid,* a ", I, ail.nl 

2.6.7. CARE III Coverage Model 

' S'SI: “ MnrkOV V 'TI" " f CAIiE »' f-'< modal (ref. m). 

* h< CARE IfI nwcTa & U ' M «“» »«’ »*'d to model permanent, transient, 


23 


and intermittent faults. In the active state, a fault is both dett^taWe to 

producing an error (at rate p). Once an uioi 1S 11 fault ( crror ) j s detected (probability q), 
the output (at rate e) and causes system ai urc. , p p W ith t j ie complementary 

«» fc-Hy ti„n of the fault. (Tina 

probabilities, the *“ » dotccte(1 fau , t was transient.) Note that both states Au 

action is based on the bcliei . f states The model is internally 

incorporated by means of equations (2), (3a) and (. i) in 



that is assigned a nonzero probability. For the pennai cut t 1 h ^ * babilHieS , 

for the transient model, a is positive and is zeio. 1* cause q, A , ' B » 

they lie between 0 and 1; because c, 6, and p are rates t ley absolved individually 

only FEIIM model for which the time unit is hours. Hie thm mode s aie sol 
and are combined aecording to the assigned probabilities for each fault ty pe.. 


2.6.8. ESPN Model 

The H\RP ESPN model is discussed in references 38 and 41 to 43 and shown in figure 13. 
The HARl physical fault behavior, transient recovery, 

It models three aspects of a fai r ^ ove ^ P f u J iavior model captures the physical status 

rth^faTsuchas whether the fault is active or benign (if permanent or intermittent) and 
whoLMhe fault still exists (if transient). Once the fault is detected, it » temporary assumed 
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to be transient, and an appropriate recovery procedure can commence. The transient recovery 
procedure can be attempted more than once. If the detect ion- recovery cycleT too 

TZce^ni a th P e e sv7e ent ( reconfi ^ation) is invoked. If the recoifiguration 

. successful, the system is again operating correctly, although in a somewhat degraded mode 



The user inputs to this model are the distribution of time (Tl T14) for each activity and ar.v 
associa e< parameters for the distribution, with a time unit of seconds. (The distributions need 
not be exponential.) Also requested are the probabilities of correct error detect ^ Zk 
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detection (d), fault isolation (/), and reconfiguration (r). The user must specify the number of 

- — sr ™ ""t = “ 

the range of 2 to 5 percent is suggested.) 11 The distributions available for mdividual transirions 
in the ESPN model are constant, n-stage Erlang, exponential, log-normal, normal Ray g , 
uniform, and Weibull. For more information on these distributions, see Tnvedi (r . )• 

The ESPN is the only FEHM simulated for solution. During the simulation, a statistical 
analysis of the simulation data is performed. The confidence intervals about the exit probabilities 
ar^ generated 5 and compared with the allowable error. If the confidence interval is too wide he 
number of trials is doubled. When the simulation has reached the desired [n 

are appended to the parameter file. If the user does not change the inputs to the ESPN mode m 
this file then the file can be used over again with the same simulation results thereby avo g 
the^ simulation run each time. However, if the user has manually changed the inputs with a 
text editor, the previous simulation results must be discarded; that is the lower portion 
parameter file must be deleted. Rerun harpeng. (See vol. 2 of this IP.) 

The simulation of the model uses a random seed value that is derived from the system time 
This method helps to assure a random simulation. However, it also implies that the Simula 
runs are not exactly reproducible. Subsequent simulations cannot match earlier runs exactly 
but multiple runs should agree to within the accuracy and confidence requested. For the Convex 
clrS platform, the user must uncomment the line SEED - 0 and make other changes m 
the harps im source file and recompile the code. 

For this model, the coverage (actor for transient restoration, is the probability of a token 
re Jhtag the Place labeled Tranent Recover,. (See fig. 13.) Coverage is the probabdlty o a 
token reaching the place labeled Permanent Recovery and single-point failure is the probability 
^rjke“ reihing the place labeled Single-Point Fadure. The fourth factor, corresponding o 
the N exit and representing a near-coincident fault, is derived from the relative passage tinu, to 
the three exits, and is discussed in section 3.2.1. For a more detailed description of the ESPN 
model, refer to the tutorial (vol. 2 of this TP). 

2.7. Multifault FEHM Near-Coincident Fault Rate Specification 

HARP provides a number of detailed single-fault models. However, for modeling coexisting 
synergistic multiple faults, HARP only provides three simple computatmaafiy ^ 
Generated multifault models for use with behavioral decomposition. The detailed mod . ng ot 
multiple faults can be computationally expensive and tedious to specify because the modeling 
requires the user to input data that are typically unavailable. 

The advantage of using the simple multifault HARP models is a significant reduction in model 
and user input data complexity, because the models are computationally fast and automatically 
generated The disadvantage is a reduction in accuracy, which experience has demonstrated 
typically acceptable (refs. 23 to 26, 41, and 45). 1 * The increased deviation resultingfrom he use 
of the simple multifault models is always positioned to produce a conservative result as long as 
the significant system failure modes are properly modeled. The amount of deviation and hence 
degree of conservatism depends on the system. A measure of the degree of conservatism 
often be determined from HARP’s optimistic simple bound. 


» — - — - - to — *■ rei “ y ° f 
systems. 
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discussion of upper and lower bounds.) 1 section .1.4 for a 

=Siig|=il?s^ 

■JZTZ!£" lit ' ^ ~ o< Z t AS 1 Markov chain mo<lc ' to crca,e «H« 

- »>■ - s,^z: 

coverage approximations and produces greater accuracy at greater execution times R ' 
XHARP, which automatically generate, Z " ‘° “ 

s=^35SS^sS«SH 

H' tSifZZTT" ?? be d ?r ined by U5 " ,g the SAME « POsstUy the USER* 
comf^en^^aved^fferen^fc^lur^rat^^ra^the^^h ?" ?" ‘ ^ *■* 

the system is composed of components with t. became Mure g r^r^SAME° Ptl, lT 1C ' Wh ° n 

2.7.1. ALL-Inclusive Near-Coincident Fault Rate 

data are usc> | A ,ha,r d0 ' “ “rr***'* result » ‘XP^ failure rate and recovery rate 

If these rates are used, editing Z g jWCT Z'lI^aNAhffiALL i!n'”" COIIIpon<!nt fnilurc “- 
^curate result. Becanse a„ parrs of failure* J°Z2£^t SZZTt ^d,“ h^ 
degree of conservatism can be large for certain systems (See chanter 7 ) Often '• 

simple bound can be used to quantize the conservative d^n % ZZn S 4 ^Tth" 
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When all system 


components haw the same failure rate, the ALL * 


and SAME models are identical in effect. 
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Conservatively, we assume that a second 

ttr^umfordZntaing the near-coincident fault rate is described as 

follows. ^ c \ TVtPT, a 

Let the exit rates of source state i be £ Mr and of Mri on state 
FEHM placed on an arc with rate k I X I has a near-coincident fault rate (NCr rtj g 

following: 


NCFR = 


^ max ( k r , £ r )^r 


r^I 


+ max (fcj — 1, ^/)A/ 


Figure 14 offers some insight ta 

multifault model m this section. , u^cplu se it contains unknown variables 

f K"e“indLt fault rate (based on the look-ahead method) 
is simply the sum of the outgoing arcs of the destination state. 


NCFR = Y irXl 


When only the sum of the outgoing arcs is ^deM^ particularly useful for 

“rr: i^.n=e r;: r — - * — fehm 

An example of this application can be found m chapter . 



Figure 14. NCFR computation. 


2.7.2. SAME-Type Near-Coincident Fault Rate 

We can assume that only near-coincident 

failure (while attempting to handle a single fau ). , h identical failure rates 

is composed of subsystems where the component ™ ^entical faiiure 

couplcd “ " e “-“ incldent fau,ts wilhi " 


28 



°‘ l,cr s,,bsystcn,s ' harp «"*»*« '--'"-I 

be vff le p the 7L\T,° r T m StM< ‘ ‘ be E krXr ‘ Ul,i Cxil r “'<* <> f •iMiimtion state , 

eiven h H fn“ Pl '“' <> ‘ 1 °" “ arc with "» M, . «* b»f » no.MOil.cid™. f.t.lt rate 

given by the following: 

NCFR = max (k[ - 1, f/)Xj 

The near-coincident fault rate expressions are determined automatically in program fifacr If 
a rate parameter cannot be parsed because it contains unknown variables or added constants 
Mace uses a look-ahead method to calculate all rates. For these state declarations, the same-type 
neai -coincident, fault rate (based on the look-ahead method) is the sum of the same-type rates 
emanating from the destination state: 


NCFR = J2 f i X l 

When only the sum of the outgoing arcs is considered, a warning is issued stating that the 
results may not be conservative. Unlike the ALL model, the SAME model can automatically 
drop failure modes for certain system models. The user is cautioned to insure that no important 
ailure modes are dropped; otherwise, a nonconservative result can be given. (Set' chapter 7.) 

2.7.3. USER-Defined Near-Coincident Fault Rate 

For some models the user may want to define explicitly, for each component, which other 

,r re Wlt, ‘ faUl ! In this Case ’ thc ’^-defined near-coincident fault 

rate for the FEIIM between operational states depends on the user input. For example, suppose 

we have a system consisting of three processors. PL P2. and P3 (all dist inct wit h unique failure 
rates , a voter l , ami a bus D. Suppose further that the processors are connected (from the 
monitoring point of view) m a ring network so that processor PI detects errors and performs 
recovery for processor P2, processor P2 likewise monitors P‘,], and Pd monitors Pi Thus 
a failure in processor PI can interfere with recovery in processor P‘2. Similarly, a failure on 
processor 12 can interfere with recovery in P‘,]. Because the processors are connected bv the 
data bus a bus failure can interfere with recovery on any of the processors: t he bus does not rely 
on any other component for recovery. The voter is self-checking; no faults interfere with recovery 
from voter faults. This behavior cannot be captured by the all-inclusive or the same-tvpe fault 
rates. It is captured by declaring that recovery in PI depends on P‘,] and the bus. ' recovery 
in P2 depends on I 1 and the bus, and recovery in P,] depends on P2 and the bus. HARP 
automatically generates the required rnultifault model as follows. 

Let the exit rates of source state i be £* r A r and of destination state j be T( r X r . Also 
< / ic the set of interfering component types for component type I . Then, a FEHM placed 

on an arc with rate kjXj has a near-coincident fault rate given by the following: 


NCFR = 


^ ^ max (k r , l r )X r 
L r^I.r-eD/ 


+ max (kj - lJj)X,I* (/ g D,) 


otherwise ' S ^ [ " dia>i0r funCti ° n that takes on vahle 1 if the subscript expression is true and 0 

The near-coincident fault rate expressions are determined automatically in program fifncc. If a 
ra e parameter cannot be parsed because it contains unknown variables or added constants, fifacr. 
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uses a look-ahead method to calculate all rates. For these state declarations the user-defined 
near-coincident fault rate (based on the look-ahead method) is given by the following: 

NCFR = irXr 

reDj 

When only the sum of the outgoing arcs is considered in determining . near-coincident fault 
rates a warning is issued stating that the results may not be conservative. Like the S. . 
multifault model, the incorrect use of the USER model could produce a nonconservative result 
when important failure modes are dropped. (See section 3.3.2.) 

2.7.4. Exact Specification of Near-Coincident Fault Rates 

The ALL SAME, and USER multifault models are provided to automatically generate the 
near-coincident fault rates. This automatic capability is extremely useful for all but trivia 
models- however, this convenience has a trade-off. The automatic multifault model is not 
capable of generating exact near-coincident for all possible Markov chains. In some cases, an 
approximating model such as the ALL model must be chosen to insure a ^wvat.Fe res lt 
The user has an alternative approach if exact rates are desired and cannot be 

..in mm tit ir model The user can derive these rates manually and enter them for HAR 
the MODELNAME.ALL, MODELNAME.SAM, or MODELNAME.USR 
files. (See section 4.2.) These ASCII files are readable and easily modified. 

Section 4.2.7 shows the format for the near-coincident fault rates. The expressions for the 
rates depend on the Markov chain of interest. Section 3.2.1 gives some insight into how 
near-coincident fault rates are related to the coverage, C; , parameters. 

2.7.5. Multiple-Run Near-Coincident or No Near-Coincident Faults 

In fiface or harpeng, the user can ignore any near-coincident faults. By specifying no near- 
coincident faults in fiface, the system model is much smaller. This selection may be necessary 
for extremely large models. (PC HARP 16-bit version does not allow the specification of near- 
coincident fault rates because of DOS’s 640K memory limitation Other versions do not have 
this restriction.) Otherwise, if the user wants to exercise several different near-coincident fa 
type options, none can be specified during the execution of harpeng. 

Thus whether the user chooses the Markov chain or fault tree option for specifying the system 
structure the near-coincident fault rates for each instance of a fault/error handling model arc 
generated Automatically. During execution of fiface. the user is naked whether the all-mclua,ve 
same-type, user-defined or no near-coincident fault rate should be used ^ cotnb™^ 

of these if any, are to be used in successive harpeng runs, for example, ALL, SAME, USER. In 
this way all options can be exercised during different harpeng execution runs. A discussion of 
the^ various near-coincident fault rate options can be found in volume 2 of this technical paper. 

As previously mentioned, an alternative is to model the systems with the AS IS Markov 
solution technique. This choice produces greater accuracy at greater execution times. Recovery 
behavior can appear to require a Markovian submodel; however, non-Markovian recovery can 
be approximated with the method of stages (refs. 44, 46, and 47). 

2.8. Truncation of Model Entered as Fault Tree 

Frequently, even for a simple model, a large number of states and transitions are produced 
upon conversion from a fault tree to a Markov model. This largeness problem is encountere 


30 


despite the use of behavioral decomposition. To solve a laree morlel HARP 11 

I'SH; 1 S£Sr tTe: 

spe C ,fy a number of total component fa„l, 5 beyond which the 1117^7 £ £^3 

SSUllill 

- 

^SSwHrHSSSS 

forced^ bTa system ^ * aSSUming that «* TA «**« - 

auj .no*;, i^, .t 

U " m “ del J°“ d » with a smaller truncation level. We SSteS th 

runcation bounds of the model to become tighter as the truncation level increases In the limit 

.““, C,tl lncreascs ' the truncated model becomes identical to the full ,„<!del anti 
full mode] 8 FOm 16 truncated model conver S e to the exact reliability value obtained from the 

2.9. FORM Model Parameters 

“““ -sts; rza 
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unreliability prediction, and the variation about t '‘ e | ‘^"'xltovarioi FORM p"r^o«r typet 

2.9.1. Failure Rate Specification 

When the FORM mo, M > «£. bui ft, a 

rr i^z:: s&r^JSK - - — ->■ » 

available in HARP. These two forms are defined as follows. 


h(t) = Aat 


<*-l 


h(t) = A ( 


at 


1 = AaAi Q_1 


r r. v sijort^ 

- - - - 

the screen. (See chapter 7 for details and appendix D for warning messag . .) 


2.9.2. Repair Rate Specification 


rute^::^^ 

warning is issued in this case. . 

The user is reminded that a nonhomogeneous Markov chain has one time variable tiat is 

fs^compLted. If the user wants to reset the clock to time zero or some other t ™ e ^£ ® 
more powerful model solver is required as the resulting model is non-Markovian. MC - 

designed to cover such models. 


2.9.3. User-Specified Coverage Parameters 


if nn FFHM’s are used in the system being modeled, the user can specify coverage parameters 
J n can be the coverage factor C, the transient reatorat.on 

factor R, and the single-point failure factor S. 

TT^I^ibull spare is precludedFom failing even though the component failure history is reset to mission time zero. 
This feature can be useful as an optimistic estimate. 
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Chapter 3 

Model Solution 


3.1. Conversion of Fault Tree to Markov Chain 

it, JnnpuMo HARP" P ! U iS ;? tema,ly C ° nVerted to a Markov chai " for solution after 

has been input to HARP. Figure 15 is an example of a fault tree for a system with three 

processors and two memory units. When using textual HARP, the user normally labels a fault 

ree with unique node labels. The node labels are assigned to basic events, gates, and the FBOX 

ynibol, which represents the system failure events. The order of specifying the node linens 

impor ant, but uniqueness is. The GO program automatically assigns the node values. 


Dictionary File 


Node 6 


Node 5 


Component Logical 


Symbolic 

Failure 

Rate 


Processor Lambda 
Memory Mu 


FEHM 
File Name 

Fehm.car 

None 


Node 3 



Node 4 


Node 1 Node 2 


tifiure If). Three-processor two-memory system dictionary file and fault tree. 

W hile inputting the model, the user is requested to create a dictionary file that is associated 
with the fault tree model. The contents of the dictionary file are shown in 

ir' The Vast T * ' H f r Tm U ‘° 8a “ 1C figUre - In this oxani Ple, the user entered the information 
■ Hit last two rows of the hie, with the exception of indices 1 and 2. The program assigns 

, icse inihct-s to the component logical names (symbols $ and & are not allowed)^ in the user 

the flult^R-r, WV r T Sh T ,ify tlH ' ,,0tati ° M ° f idcutif y»*8 th «' I'oinponent, types itt 
. 7 of h 7 U TT' U hgU ; e 15 ’ , th «« aia ‘ «hown in the fault tree as the numbers to 
g , f 7 . Syin ,ols -. For Uode k the 3*1 means that there are three identical type one 

tT" : 'filTTvhin TTlvT T th - thC Sa,n<> faU " r0 rate sy,,lbo1 La,Ill>da aad FEHM described by 

not a n I ' aS,C CW,,tS are 1<k!ntical ^plications of a component type, the 3*1 

inste^ cT7!m.Tn.!rm rain tkc ^ IUodcl by rate symbol 

a. ad of three. scpaiate ones. In figure 16, the effect of the 3*1 notation is to assign the failure 

ate 3 A to the transition from (3,2) to (2.2). The user must differentiate component type indices 

reZ n‘d 1 ^ ' V C ° mPOIieUt tyP ° imlcx can bu to any node, uniqueness is not 

( quin d, liowevei , uniqueness is required for the node indices. 

The fault tree in figure 15 is converted by HARP into the Markov chain shown in figure 1G All 
becomes state m the Markov chain. Note that the basic event * notation has reduced the 


" Av ° id Spocial characters because they often interfere with the operating system. 
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nun, her of state combinations. The user dor* not have to delineate all possilde 

only those that are required. In this model, 32 rombmatrons are possible, but. only S an qu 

Exhaustion of redundancy failure states are also generated. 

Also note that only a nonrepairable system can he specified by means of a fault tree. (Fault 
tree model with repair have not yet been developed.) 

availability prediction, the user must either input, the model diuc \ / ' ‘ ‘ ]NT 

snecifv a fault tree model and then subsequently modify the Markov chain MOD - * - * ^ 

r.o incta.o ,1m repair transitions with a text editor. For details of the algorithm used for 

conversion, see chapter 6 and reference 49. 

3.2. Modeling Imperfect Coverage 

The possibility of imperfect fault coverage is automatically incorporated into the FORM 
m ode Markov chain ms follows. Through the dictionary, each component type m the system can 
have associated with 1, a fault/error handling model that describes the 
. dhu.in,. mrmwtiu'nt The three-processor two-memory system shown as a Markov cliam 
figure 16 is used to demonstrate the idea of imperfect coverage. Oompuncnts of 
the processors, one of which must lie operational for the system to remain up. ■ ■ 

d e Z, one.lt type 2 (the memories) is necessary for operation. Processors fad with ate 3 
Zd memories with rate ,, For our example, the states are labeled with a pair nt numbers 
the first signifying the number of operational processors and the second satisfying the nun, , 
r, components. Once the nun, her of P— 
entered Once the number of memory components is exhausted, state - 1 

has specified coverage, the HARP program automatically places a FFHM on the appropriate 
arcs, as shown in figure 17. HARP prompts the user for dictionary ml <> * h '»’ ' 

FFHM model to be used for a processor failure this model is used for FLHM riu ^ ’ ’ 

6 and 7 The FFHM model for memory failures is used in FEHM num icrs , , am a. ■ 

the contents of box 1. 2, 6. and 7 are identical but may differ from the contents of box 3, 4, 
and 5 which are also identical to each other). (However, the user may override a FEHM on a 
specific arc see section 4.6.2 and chapter 7.) The difference in the FEHM’s (i.e„ why they are 
numbered 1 to 7 rather than just 1 and 2) is in the near-coincident fault rate used to calculate 

the N's. 
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3.2.1. Automatic Incorporation of Coverage Models 


(R T C°a^d H SWnd d f' S "* SOlV ° d in iS ° i8ti0n for th0 exil Payabilities for the three exits 

S3=pSs£S 

and one to represent near^iStt* *<££% °” '° 



More specifically assume that a fault of component type 1 in state (i j) leads to state 

LnsiLi\:iT- 7; r z a ::tiittft 7 

single-point failure state occurs with probability S, , M Z , the 

occurs with probability r prooamiity A (j J} , ( j 1J} . A transition back to state (ij) 

with the following probalSliity:" 1 ’ ^ & ^ near - coincident fad ”re state occurs 

i eMa Ih OV d * i " by firsl 

from state (ij) into the FEHM of component type 1 by C ' ^ th ® ° nglIla ! rate ? 

from state d Ox ........ J ' V C ddUi-lJ) and l >>' then adding arcs 


from state (M) to the faihtre states. T,',ese additional ‘repro^m’ aTow „ “ I “ S “ 

“ fT *» *■ near-coincident hJt'Hfc Sito 

(U) = fir **“ bCtW “ !l ‘ ™“' — example, when 
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These coverage failure states can be differentiated 
state, if the user desires a comparison of ^ 5 figure 15, where 

imperfect coverage representation of e thr P , represents the failure of the system 

FSPF represents the single-pom Mure ^ fetors ^ single subscri pts for 

^of ltat“r" "n If «r 18 is an approximation (see section 8.1) to the 
stochastic process represented by figure 17, 



The state diagram of figure 17 is automatical^ 
solves this Markov chain for state probabilities. the stochastic process 

5, and N Parameters Jf P^ se large and generally non-Markovian. For 

typeschosen ar senm^ovfe, " — ” IS 

is either semi-Markov (if all failure ra es a simulated FEHM is chosen, the stochastic 

failure rates is chosen to be Weibull). Similarly, lf E veu wlmn a Markovian 
process of figure 17 will be more genera an a se generally a very stiff Markov 

FEHM is chosen, the process represented by figure 18 i (% 8 J ration an d 

** — "°” s - to be used 

in the solution of the FEHM’s (explained in section 2.7) . 

_ , . .. nfthor R S and N parameters can be illustrated by way of a simple example 

The derivation oftheC, ^,*, and iv para gsor two _ me mory system, that is, 

system architecture that is a variation throe-nrocessor two- memory system. Again, 

consideration of only the three-processor par ° e coverage into the perfect coverage 

we automatically incorporate the possibility of impe t g the coverage model 

Markov chain, as shown in our thre^pro^or example (fig_ 191 o 8 n0 of thc 

by C F of S M ,nu,t bT^bTre“ ' S to^“ Wt (which is an exponentially 
distributed' random ^arUble with parameter 2 , A, if a near-coincident fault is to be avoided. 
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Figure 19. Three- processor system. FEHM's witli C, S, 


R t and N exit probabilities. 



Figure 20. Three-processor system showing near-coincident faults 



“ I'"' 6 11,0 C ° VOr “ SC m “ ld d0 "° t0<l by FEHM 2 ' processor failure can occur 

rJTZX 2«f T.!u 8nd / E P M l "a," 6 ' 1 "; 19 “ ».v distributed delay., with 

c l i „ l.,t ■, 5 “ «• Notc «“* i" 'bu absence of a „ear-«, incident fault 

, L , , ,nv<v, r - with the near-coincident fault occurring at. the rate 2 * A from FFHM i 
,c probability of a n.uxcsiul C exit before the occurred of . S.-C, d a ,1 Z c n „ ' 
IS easily shown to lie ( ■< = u- j Similarlv fnr fphvi o n b u lcn ' rau " 

shown in figure 21. In figure + 2r N s = 1 - h. and W - 1 C W • 10 rwlu<; « 1 iikkIoI is 

fonlt« ,..„,cuc ri h , , L,{ aiK1 ‘ V ^ ~ * f he inclusion of lntorferine 

faults , auscs the coverage value, to become state dependent . HA HP automatically derive, the 
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coverage factors by taking the Laplace transform of nUauh 

the transforms for the single fault model and then J“" a e values „ the 

rate for the Laplace transform variable to oMamtlie s dP si „ n of the Laplace 

time-to-exit distribution is not avarlable m closed Mxt P fault rate and on the 

moments are easily obtained from emptrical or simulation 
data. See reference 6 for the mathematical derivations. 

We need not restrict ourselves to single-state FEHM’s. Let us^ain look at a port.on o « e 
CARE III coverage model that was introduced in section 2.6.7. (See hg. •) 



Now the 


FEHM probabilities when replaced by a branch point are as follows: 


C3 


6 + p-r2*A + 6 + p + 2 * A e + 2 * A 

p (l-<z)* c 


q * e 


and 


5 3 = 


C 2 


S 2 


6+p+A + ^+P+A e+A 

p (!-<?)*« 


q * e 


e + A 


As previously^ mentioned, ^probabilities are determined by HARP based on the user inputs 
for the rates and probabilities in the model. 

3.2.2. State-Dependent FEHM— Overriding the Default Model 

Suppose we want to override the FEHM file aviated with 

the three-processor t^m^ exam* P ^ failure is more closely modeled 

FEHM.HRP. Assume that from state 2, l reco y h we change the description 

by a different FEHM model, which stored m FEH! L ^ This change does not 

of the Markov state transition from 2*LAMBDA , to inste ad want to turn off the 

affect any other transitions triggered by a h amp i e t h e state transition 

FEHM for this transition, we can use the keyword NONE, tor this examp , 

is 2*LAMBDA:N0NE;. 
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(rather than use a FEHM^ ™ vera S e val » es 

Ti the lack of a model) cannot 

rr Pe “ 

3.3. Numerical Solution Techniques 

beer^^utol^th^a^lv^ inserted iZ^heZlt °°"^ ^ ^ 
model remains to be solved The m I , ^ f ham representation of the FORM 

system of ordinary differential equations Lfoffows!^ 135116 ^ fig ' ^ pr ° duCes a linear 

P'(t)=A(t)P(t) (P(0)= Pj) 

s=r 

R w= E 

if UP states 


u w= E w 

if DOWN states 

::;lx o ":;x e i“ on r“ y ' ****** 

from down states HARP im P « tho • ^ ’ se re P air transitions are emanating 

and unavailable. PreVK> “ ‘ h « instantaneous availability 

3.3.1. Default Solution Technique 

variation of the adaptive Rimg^ 50 ) ( & ^ * *,?* . ™ th a 

robust for a large variety of models Althoim OFfiR . f r RK ^ been rehablt! arid 
a .ong time; tbns, an alL,a„ve sliTS of 

— “jrss t r- rr r 

by the FORM/FFHM decomposition are conservative. Model deviations contributed 

When the soourn times of the FORM fu t?tpuaf' f * w eG section o. 4 . 1 .) 
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3.3.2. Stiff System Solution 


timcilT thcir P r ° ,<iUCt Whe " * 

is large (say greater than 100), a special stiff solver is invoked. 

obtained. Thus, substantial computation time requirements resu . 

HARP uses a special method oUW T ™ b“ ifferenie 

S’ “g E RK is an explicff method, £ 

(BL)1 2). Because in e fnr stiff models TR-BDF2 takes much less solution 

time tL C GE b RK a The' tUbDF 2 method has provided goo’d accuracy and excellent JaMhty on 
iff Mrnhlems The choice of which solver to use is made internally by HARP. MCI-HAKP uses 

on the variance reduction technique, called importance sampling, 

which is effective for solving stiff systems. 

3.3.3. Computational Precision 

The coverage value precision and hence the system unreliability computation ^mUyde^nda 
primarily upon the FEHM being used in the modeh The — ^ue P-^predilw 

FEHtfs except ESPN FEHM, IJXc" on of t pr"is” „ of ihe coverage computation 
are vahd from un.ty to 10 ““ " e ‘ ™ maller value ? imputed, then a warning message 

as determined by the EPX variable. a sn a 15 , Epx j can be adjusted 

by GERK and the round-off error. 

, , . ttarp prnrine utilizes an epsilon variable entitled EPX. It a 

Each coverage model m £ HARP ^ > Llkewise , if the coverage factor 

coverage factor falls w.thm EPX mrd^ ^ ^ ^ ^ ^ ^ ^ fte ^ M 

falls within t . EPX ’ computed coverage value is changed to 0 or 1, the 

tnre^lne is set twfee in each hle-once for 
the nominal computation and once for the bounds computations. 

3.4. Error Bounds 

Two different kinds of bounds are provided by the HARP program; simple model <P“"“‘ ri c> 
bounds and truncation model bounds. Depending on the system being modeled, none. one. or 

both kinds of bounds are applicable. „ 

The simple parametric bounds are computed for two distinct classes of models: (1) the 
model that does not use any FEHM’s, where behavioral decomposition is not invoked, 

^ Stiffness refers to a mathematic^ model that contains widely separated time constants associated with a system of 

is isolated ^ other tern* Of the e^on, and e*Ucit means it is. 
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(2) those models that do invoke FEHM's anrl hohavinroi i . - 

b« modified to reflee, the mode! state red„e,i„„ teel alS ° 

unrl^flit'lTt”:,::" «"*■ «PO«. -be effect on system 

sensitivity analyses. The simple parametric bo, mdsfor tltlZIel daj'^t™*" "" 7'!"' f 
original user-specified mode] (A/1). 1 bounds for the 

ari : fi i " | TOk « 1 ’ 'b- simple bounds take on two 
computation (prompted by HARP) simnl'c ” SPCC1 ^ an< ,,K ’ user selects the simple bounds 
estimated ma '""f I TT' °" 

u " iik ' thc as -^^3 

The conditions 

° y r“ 

3.4.1. Simple Model (Parametric) Bounds 

3. 4. 1.1. AS IS Model 

ca„ ,hc -* 

mte m lidlhetpper bomrf oTth” ef ’ taking the 

on the failure raTa^d ^ *"* — a " d * ‘ aki " g the upper bound 
the predicted unreliability based on the nomM^eVrh “ W ° rS ‘ “ “ ls ° prodl,res 
model class are tree bounds'* for the original user-specified model '(my"'"’™ "" dS *'* 

3.4. 1.2. Models Using Behavioral Decomposition 

general form of the simple bounds is given as follows: ’ ’ ° 3) ' Thc 


p (A U D)< min[l, P(A high ) + P(H max | 


1 (A U B)> max[P(A low ) ; P(B miu j 

The firs, rule gives the conservative bound and the second rule gives the optimistic bound '» 

boui^ S '°The' 'system I"* * T* *-.«* — 


i; 


unreliability bound. The syst em faiiure prohahiii^ 

for these type/of iiserTnistak^' 011 * r ° P ' Ur tran8lt,on or v,ce vorsa can ca,lsp ‘"verted bounds. HARP does not check 

19 ValWity ° f theSC bOUndS ^ SUbj ° <:t 10 multifeult models, where apphcable (see section 2.7). 
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. , p/ a \ anr ] p( a. ) arc used instead of P{A) when parametric 

redundancy. The probabilities P(A high ) and P( low! , . p( A) to be m i n i mU m to 

tolerance are selected to cause P(A ) to be maximum o ge *»gh) When FEHM’s 

“‘ p, 4 , x The probability of system failure due to imperfect coverage is P(B). W hen t bHM s 

low)’ P ... p/ r>\ rrmirmted for the minimum imperfect 

are specified for behavioral decomposition, P(B) is computed tor tne Th , ability 

1 * t p/R \ «nd tlie maximum imperfect coverage to get P(ts max )- rneprooa y 

coverage to get P\B m \ n ) and tne maximum 1 y FEHM. The perfect 

that reflect transient restoration probabilities, i ne n 
by redundancy exhaustion because transient restoration occurs. 

The simple bounds computed by HARP are the bounds on the im ^ ^ 

( \jo A H and M4 of fig 3), which produces the unreliability result (A/4) and can also bou 

M\ under ccrt f , ; nd^ “ 

(mTmC ^TUT^dThe uL-speeified mode, (Ml) (provided al, failure rates 
are constant). 21 . . 

modifying the HARP generated ASCII files. 

Rnmember that the HARP simple bounds are used for preliminary estimates of unreliability. 

r y U FE a HM’s 0 orwith the VALUES FEHM, the HARP bounds are true bounds for the 
user-specified model (Ml), that is, the full model. 

3.4.2. Truncation Bounds 

As mentioned in section 2.8. truncation bounds are obtained as follows. When the truncated 

« -* *• probabilities of the 

be failed s & r™ states are automatically considered to be 

HARPi'to use some notation) the spates' in the truncated mode, are denoted 
“ITS tr and the states in the full model have the subscript Ml. The bounds on the 
system unreliability are given by the following: 

Pr(DStr) < SU m < Pr(TAtr)+ Pr{DS tr ) 

faillS;^^ 

transitions can «. the staple upper bound to [^jl^ Ysc,“ta o ^ XHARR The HARP AS IS panic, 
model (Ml). For such systems, the user can ed.t HARP generated ASCII hies 

can also be used to provide accurate results. 
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( a ) Typical. 



(b) Pathological. 

Figure 23. Typical and pathological simple lower bounds. 

fai 1 ur^pr obabtl i F demotes ^ s t a t ° SyStem UnreIiabiHt y 35 wel1 - -dividual 

of component type 1 are tSZlZZSS u" ? mP ° nentS than the * required 
use the" probability ofTetag !„Te F 1,1? S “T ° CC " ra ^ ^ « 

to exhaustion of component 1 All transt t,n H T , T "r P robabil '‘y of failure due 
fnmcation line and Jnot’C “* ” *■“ 

Probability of failure due to exhaustion of component 1, />r(Fl M ), is bounded as follows: 

Pr {Fl tr )< Pr(Fl full )< Pr(7'x4l i t r )+ Pr(Fl tr ) 

manner. Now * Si '" ilar 
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The probability of being in the FNCF state before the truncation level is a lower bound on 
the FNCF probability. The upper bound is taken to be this lower bound probability added 
the combined probability of all TA states: 

Pr{FNCF tr )< Pr(FNCF Mi )< Pr{TA tr )+ Pr{FNCF tr ) 

The bounds on probability of SPF are obtained in a similar manner as follows: 
Pr(FSPFtr)< Pr(FSPF{ u \\)< Pr(TAt T )+ Pr{FSPFt r ) 


3.4.3. Combined Bounds 

fdlo^ng^w^^ Th^ simple^ modeT^uUoi^u^^h^o^imlsttc pKsrMnetera 
SrSE 7es« repair ra f and coverage factors, to produce an 
upper bound on the reliability (P h igh) of the system (ref. 53). 

fl high (t) = 1 — maxfPeshlowCOi Pcovlow(^l 

where P eshlow is the system failure probability due to exhaustion of system redundancy 
and P covlow is the system failure probability due to minimal coverage. 

If the model from which the simple bounds are derived is a truncated model, then the 

rs 

ris lowest possible coverage factors and repair rates) to produce a lower bound on the 
reliability (Pi ow ) of the s y stem ( ref - 53 )‘ 


P low (t) = 1 - min [Peshhigh(t) + P CO v high(*)> *] 


If the model from which the simple model bounds are derived is a truncated model, then t 
truncation aggregation states are taken to be failure states (for the pessimistic bounds). The first 
type of bounds are reported as simple model bounds , the second type are 
model bounds , and the combined bounds are reported as truncated simple model bounds. 

The use of behavioral decomposition and the instantaneous jump model factors have been 
nroven to result in conservative estimates of reliability (ref. 8), when failure rates are constant 
Lvnnnential times to failure). Both bounding techniques (simple and truncation) produce 
bounds on this conservative estimate of reliability. For practical highly reliable systems, the 
HARP (simple and truncation) bounds also encompass the reliability of the original model 
When ^ the disparity of the model sojourn times are too close to guarantee valid bounds, a 

warning message is issued. 
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Chapter 4 


HARP Structure and User Input 

4.1. Overview of Program Structure 


The HARP solution of a reliability model consists of running three 
tdmve for model construction, fiface for interface, and harpeng for solution 


sequential programs: 
. (See fig. 24 .) 



Figure 24. HARP program structure. 


anv^ FFHM \ the USe *\ , 1S stepped through the model construction phase to produce the FORM, 
any FEHM s, and a dictionary hie for the system. The FORM can be either as a fault tree (for 

b^H T r P ? Sy T S) ° r a Mark ° V Chain ’ and tHe FEHM ’ S Can be an -y of ‘he ones supported 
by HARP. For each component in the system, the dictionary file contains the logical name of 

the componen the symbolic name for the failure rate, the name of the FEHM parameter file 
and any user-defined near-coincident fault rates. When the name of the FEHM parameter file 
is specified, the user can create the file at that time or specify that it already exists. 

S “P phed ™ lth this model representation, the t drive program creates several output files that 

and be CarefulIy edlted and rerun or used b y th e interface program fiface. By keeping the input 
and interface programs separate, the user can use the fault tree or Markov chain information for 

Zr;r ^ rr g ^ ^ P vi & te interface. The fiface program uses the files created in 
K f ' C transition rate matrix for use by the solution program harpeng. 

Additionally, symbo table information and failure state information is passed to the solution 
program which translates these representations into the system unreliability over a user-specified 
time period. If desired, the optimistic and conservative bounds are also supplied. 

4.2. File Naming Conventions 

Several files are created when running the HARP program, and the names of these files are 
derived from the user-supplied model name. Once the user has specified a model name that model 
name is used to create the filenames used throughout the HARP program. The model name 


45 





(up to nine characters in length, eight for PC) is appended with three character ocb^ioM^ 
produce the reserved files. Special characters that can interfere with the user s op^ngsy^em 
should be avoided, e.g., avoid using * or & as a model name or extension. g 
the HARP structure and identifies the files with the 

The following sections give a representative listing of eac o HARP 

contents shown were obtained by running the example of figure 15 through HARP. 



Figure 25. File structure of HARP. 


4.2.1. MODELNAME.TXT 

The symbolic textual fault tree description file is entered at the terminal by the user in 
program Urive. It is then converted to MODELNAME.FTR so that it can be converted to a 
Markov chain for solution. Typical file contents using the example model in section 5 figure 
are as follows: 


1 

2 

3 4 


4.2.2. MODELNAME.FTR 

The fault tree description file is created either from the textual description file ( TXT) in the 
tdnve program or directly from the graphics program and is converted to a Markov chain for 
solution. Those lines beginning with an ’N’ represent Markov chain nodes and those beginning 
wth an ’A’ designate arcs, arrows, or lines (connectors). The fields for the nodes are: N xcoor 


NODE 

1: 

TYPE 

BASIC, 

3 

OF 

COMPONENT 1 

NODE 

2: 

TYPE 

BASIC, 

2 

OF 

COMPONENT 2 

NODE 

3: 

TYPE 

AND 



, 1 INPUTS : 

NODE 

4: 

TYPE 

AND 



, 1 INPUTS : 

NODE 

5: 

TYPE 

OR 



, 2 INPUTS : 

NODE 

6: 

TYPE 

FBOX , 

INPUT: 

5 
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ycoor type_node node-label. (See vol. 3 of this TP for more details.) Typical file contents are 
as follows: 

N 1 1 16 3*1 

N 2 2 16 2*2 

N 3 3 19 

A 1 1 3 3 

N 4 4 19 

A 2 2 4 4 

N 5 5 17 

A 3 3 5 5 

A 4 4 5 5 

N 6 6 22 

A 5 5 6 6 

4.2.3. MODELNAME.DIC 

The dictionary file contains the logical name for each component type (e.g., processor, sensor), 
its symbolic failure rate parameter (e.g., lambda, mu) and the FEHM parameter filename (if 
any). It also contains the user-specified interfering component types. This file is created either 
by the textual input program or from the graphics program. Typical file contents are as follows: 

1 PROCESSOR LAMBDA FEHM. CAR 

INTERFERING COMPONENT TYPES: 2 

2 MEMORY MU NONE 

INTERFERING COMPONENT TYPES: 

FEIDS (See section 4.3.2.) 

7 6 

The dictionary is required for fault tree FORM’S. A Markov' chain FORM requires the dictionary 
if coverage is to be included in the model. The dictionary matches failure rates with the coverage 
information file to correctly solve the model. It is designed as a tool for both the user and the 
program. The user can make changes in the dictionary file to accommodate any special modeling 
requirements. When creating the dictionary, do not use the symbol $ as a character in a failure 
rate symbol name. This symbol causes the program to ask the user to declare the meaning of 
the name without the symbol $ as well as with it, that is, two symbols result when only one is 
intended. 

4.2.4. MODELNAME.INT 

The symbolic textual Markov chain description file is created by t drive. It is read by the 
interface program fiface and is converted to the symbolic transition rate matrix file (.MAT) for 
the HARP engine. The first line of the file is SORTED if the Markov chain was created from 
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a fault tree and either SORTED or UNSORTED if the input was a Markov chain. Typical file 
contents are as follows: 

SORTED 

1 2 3*LAMBDA ; 

1 3 2*MU; 

2 4 2*LAMBDA ; 

2 5 2*MU ; 

Refer to section 3.3 for further information on the MODELNAME.INT file. 


4.2.5. MODELNAME.MAT 


The symbolic textual transition rate matrix is read by the HARP engine. The HARP engine 
requires a specific ordering of its matrix, with row and column values of nonzero entries entere 
in ascending order. Matrix entry i,j represents a transition rate from state J to state iu For 
entry 2,1 in the second row in the following table, for example, means that *- 2 and J - and 
3*LAMBDA*C1 is the transition rate from state 1 to state 2. The number 10 m the first row is 
the number of model states. Additionally, a symbol X is created and is concatenated to those 
transitions leading to the failure due to exhaustion state. It serves as a flag variable for the 
bounds computation. The end of the matrix is flagged with value 0,0. This file is created b> 
program fiface. Typical file contents are as follows: 


10 

2 , 1 


3*LAMBDA*C1 ; 


3, 1 
2*MU; 

4, 2 

2*LAMBDA*C2 ; 


5, 2 

2*MU ; 

5, 3 

3*LAMBDA*C3 ; 


6, 3 

MU*X; 


6 , 8 

4.2.6. MODELNAME.SYM 


The symbol table and failure (and possibly operational) state information file also contains 
whatever symbol table information can be deduced from the graph. Specifically, for eac 
coverage factor (i.e., Q), it lists the symbol type number (always a 3 for FE ™ t J'P“ otl | 
than VALUES) and the parameter file containing the FEHM information. Additionally, for 
FEHM type VALUES, the corresponding R t and S t are listed for each component type wi l 
the “VALUES” designation. In this case, the symbol type numbers are 7 for the Gj, o or *; 
and 10 for S t . If near coincident fault rates are being considered, the N, values are also prin e 
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with a symbol type number of 9. The symbol X, which 
MODELNAME.INT) denoting transition to the failure 


appears in the TINT file (not shown in 
state, is assigned the numeric value 999. 


For the following file contents, the data can be interpreted for figure 17 as follows The 
coverage parameter Cl is defined by the FEHM given by the file FEHM.CAR and has the 
number 3 associated with it to designate that the FEHM is not a “VALUES" FEHM. Figure 17 

'T! fii a f f f Ct ° r in the trar J sition rate * 3AC1 between states 3, 2 and 2, 2. The C4 parameter 
^ t ' e i' Umber 7 below !t ’ which Agnates that C4 is defined by a VALUES FEHM 
and the va ue of C4 is 0.7000000000 with tolerance 0.0000000. Likewise, the probabilities and 

tolerances for the R4, N4, and S4 transitions are listed as well. (See section 2.6 for details of 
their meaning.) 


he user has the option of entering the values for these parameters in program fiface or in 
program h( ^ng- If the user elects to enter the values in the engine, fiface lists the values 
as 1.00. All failure states (and operational states whose probabilities are desired) are listed 

subtrLf 100o7 a " t ; heir K l0< : ati0n “ ^e matrix. To interpret where a failure state is located, 
subtract 1000 from the absolute value of the number listed. This file is created by program 
fiface. Typical file contents are as follows: ' K 


Cl 

3 


FEHM.CAR 

C2 

3 

FEHM . CAR 

C3 

3 


FEHM.CAR 

C4 

7 

0 . 700000000000 0 . 000000000000 

R4 

8 


0 . 100000000000 0 . 000000000000 
N4 
9 


0 . 100000000000 0 . 000000000000 
S4 
10 


0 . 100000000000 0 . 000000000000 
X 

999 
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END SYMBOL DEFINITION 
FI 


1007 


F2 


1006 


FSPF 

1009 


FNCF 


1010 


END FAILURE STATE DEFINITION 

4.2.7. MODELNAME.ALL, MODELN AME.S AM, and 
MODELNAME.USR 


The near-coincident fault rate information files are also created by program fiface or 
each coverage factor (i.e., Q), they list the symbolic value of the near-coincident fau t ra • 
MODELNAME.ALL lists the symbolic value of the ALL- inclusive “^‘^naME lJSR the 
MODELNAMB.SAM the 

and MODELNAME.USR are similar. The following expression C t is the near-coincident fault 
rate associated with the C{ transition. It is not equal to C l . 


Cl 

2*LAMBDA+2*MU ; 


C2 

LAMBDA+2*MU ; 


C3 

3*LAMBDA+MU ; 
C4 

2*LAMBDA+MU ; 


4.2.8. MODELN AME.INP 

The MODELNAME.INP is an echo file containing the name of the matrix file and values 
for The symbolic rates defined by the user at runtime (of the HARP engine). This fi e can be 
edited after the HARP engine program has completed; thus, the need to enter parameter values 
durhig future runs is elmlatA This hie is an output of the HARP engine program. TVp.eal 
file contents are as follows: 


3P2M . MAT 

Symbol No. Symbol 

1 LAMBDA 

2 MU 


Type 

1 

1 


Value 


Variation 


0 . 10000000D-03 0 . 10000000D-06 

0 . 10000000d-01 0 . 10000000D-04 
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State name: FSPF 
State name: FNCF 


0. 13012236D-05 
0. 13617744D-06 


Reliability = 0 . 99094265D+00 

Unreliability = 0 . 90573470D-02 

Total failure by redundancy exhaustion = 0 . 90559096D-02 


Parametric Bounds using SIMPLE Model: 

Lower Bound on Unreliability = 0 . 90387040D-02 

Upper Bound on Unreliability = 0 . 90745973D-02 

See Users Guide, section 3.4.1 for interpretation. 
GERK ODE solver: global error value 0.200D-15 

relative error value 0.100D-08 
See Users Guide, section 3.3.1 for interpretation. 

0 Reports from the GERK ODE solver. 

4.2.10. MODELNAME.PT* 


A textual file containing the unreliability values for the model is plotted along with the 

left 1 rr S T ’ ^ T* ° UtpUt by the HARP cnfime Program. In the following table, the 
left most column gives the times at which their corresponding unreliability values in the right 

co umn are computed. These values are provided as input data for a useris plotting ^ram 
- asterisk (*) is an integer, beginning with 1, that is incremented for each rerun of the engine 
during the same session. Up to nine runs may be executed during the same session. (Note Tf 
e program is terminated and then rerun, all files are destroyed and rewritten.) The contents 
ns le can be created with a text editor and used as input to the HARPO module. Also if 
. a are generated by another program and can be put into the *.PT file format, HARPO can 
.play that data also, possibly for comparative analysis. Typical file contents are as follows: 
0.00000000 0 . 00000000E+00 


10.00000000 


0 . 90570200E-02 


defitto (i p! xi , " pu ‘ pr °S[ a '" s “eate fault/error handling model parameter value 
hand l’ , Th Wcs have d ' fferent formats ' corresponding to the choice of the fault/error 

handling model specification technique. The first line of the file specifies the tvoe of mod^l 
such as HARP.SINGLE.FAULT.MODEL, and the necessary 

flies do not have the near-comcident-fault rate expressions: instead, the near-coincident fault 
rate expression ,s an attribute of the particular coverage symbol. Different coverage symbols 
may have the same fault/error handling model parameter flies but use different near-coincident 
fault-rate expressions. These files are created in the textual and graphical input programs 
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Refer to section 3.5 for more details on the input file. 

4.2.9. MODELNAME.RS* 

The MODELNAME.RS* is a textual file with the reliability/unreliability values for the model 
is an out nut of the HARP engine program. Each time the HARP engine is rerun during e same 

^ thf end of MODELNAME.RS*. The asterisk (.) is an integer^ 

beginning with 1, that is incremented for each rerun of the engine durmg 
HARPO module reads this file to make interactive graphical analysis available. (The H 
module expects an upper case filename extension.) Up to nine runs can be executed during ie 
same session. (Note: if the program is terminated and then rerun, all files are des xoye an 
rewritten.) Typical file contents are as follows: 


HARP 


- The Hybrid Automated Reliability Predictor 
Release Version 7.0 


February 1993 


Modelname : 

3P2M 

Input description (from dictionary file) : 

Component type: 1 Name: PROCESSOR 

Symbolic failure rate: 

LAMBDA Constant failure rate: 

0. 10000000D-03 +/- 

FEHM file name: FEHM.CAR 

For this FEHM model, the exit probabilities are 
(in the absence of near-coincident faults) 


0 . 10000000D-06 


0 . OOOOOOOOD+OO 
0 . 99956467D+00 
0 . 43532615D-03 


Transient restoration: 

Permanent coverage : 

Single-point failure: 

Component type : 2 Name : MEMORY 

Symbolic failure rate: 

MU Constant failure rate: 

0. 10000000D-01 +/- 

FEHM file name: NONE 

ALL-INCLUSIVE near-coincident fault rate used. 
Time (in Hours): 0.100D+02 

State Probabilities 

State name: FI 0 . 99203074D-09 

State name : F2 0 . 90559086D-02 


0 . 10000000D-04 
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4.3. Inputting a Markov Chain FORM 


4.3.1. State Transition Specification 

A Markov chain is entered in the following format: 
statejx state y rate -transit ion 

The user can enter the information in sorted or unsorted order. If the sorted option is chosen 
the state names must be integers listed in row-wise order (beginning with the number 1) First’ 
all transitions emanating from state 1 are listed, then those from state 2, etc If the unsold 
option is chosen, the state names can be nonintegers listed in any fashion. However the first 

probability rfTXT, M <* th<! «**»» <*«*■•'» is aligned a„ initial stain 

P b ity of 1, while all other states have an initial state probability of 0. Also if the input 

is imsorted the size of the model is limited a total of 500 states can be included and up o 

20o0 transitions. For sorted (or fault tree) input, the number of states is increased to 10 000 

and the number of transitions to 90 000. These limits can be changed, however as explained 

in a section 0.3^ For either type of Markov chain input, the state names cannot be mole than 

asterisk a (*) r so ST? ^ ^ by beginnhl S a » d ending a line with the 

asterisk ( ) so that the line is printed in the MODELNAME.INT file but ignored by program 

4.3.2. Failure State Specification 

. In H f ^ R . P ’ any State whose labeI be S ins with the letter F is considered a failure state Four 
types of failure states are represented. For failure by redundancy exhaustion, one failure state is 
associated with each component type in the system. These failure states are labeled F ; , where i 
m the component type number failing and F stands for “failure due to exhaustion.” For those 
f -i r 1 imp _ er ^ c ^ coverage, the occurrence of a single-point failure and near-coincident fault 
failure is recognized by failures states FSPF and FNCF, respectively. These latter two states 

Fco^fT t ^ , t V > nterfaCe Pr0granK fiface ' automatically. Any other state label beginning with 
* COntnbutes to the system unreliability but not to the specific failure probabilities. 

therefore ^ ^ * T**' ^ ° ntCr the state «• numbers; 

therefore, a state label beginning with an F is not allowed. In this instance, HARP can recognize 

entering * fail ureas' t at es °r ^ WayS y First - “> bounds in the engine program, those transitions 
f r 1 nust lm,e a PPcnded to them. Therefore, when inputting the FORM 

transit”" ^ ' ^ ^ appropriate transitions to designate the state into which the 

the f ii 10n g ° e r cUS a fa,lu ^ e stat0 - Sec ond, the user can edit the dictionary file (.DIG) by adding 
the following lines for failure ID's (FEIDS) to the end of the file: * g 


FEIDS 

fl f 2 f3 


fn 


where fl, f2, (3, .... fi, are positive integers that identify the failure states P, F, F, F„ 
respectively. Note that adding .X to the failure state in an unsorted Markov chain b J allowed' 

4.3.3. Solving Arbitrary Markov Chains 

HARP can be used to solve arbitrary (general) Markov chains simply by stating that the 
niodel being described is to be solved AS iS. Under this designation,^ FEHM moulds are 
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4.3.4. Sorted Versus Unsorted Input 


When the FORM input is a fault tree, HARP converts this fault tree into a sorted Markov 
chato However many systems cannot he modeled using fault trees. Hence the user must enter 
a Markov chain as input. As previously mentioned, the user can enter the Markov chain in 
Lted moneyed order. If the model being evaluated is very large, then the user should input 
toe Markov chain in sorted order because only 500 states and up to 2050 transitions can be 
included),) an unsorted model. A sorted model, on the other hand, can have 10000 states and 

up to 90000 transitions. 

Several differences should be noted about sorted and unsorted Markov chains. If m “ “ r, “ 
Mmkov chain ,X’s are appended to the failure states, then HARP evaluates bounds on the 

rehaMitv Moreover, only the probabilities associated with the failure states are given, not those 
reliability. Mortove , y P n<> , x , s or FEIDS are specified in the input, the program 

LramTSution stage) asks the user to specify failure states. If the user does not specify any 
2c stat7the state probabilities of all states are printed while the system reliability » not 
'“ en (It'no failure Lies are specified). If the user does specify failure 
then the system reliability and the failure state probabilities are printed. In either case, bound 

are not evaluated. 

For sorted input, if FEIDS are specified, then the resulting failure probabilities are listed in 
the order in which the states are listed in the FEIDS. For example, if state 12 is mentlon J^ ^ 
under FEIDS in the MODELN AME. DIC file, then the failure state FI corresponds to state 

and F2 to the next state mentioned under FEIDS and so on. 

4.3.5. Labeling Transitions 

The Markov chain transitions are normally symbolically labeled with an expression of' the 
form constant . failure rale. Failure rate transitions arc denoted by a single fa, ure rat. , vanab e 
fi.e. A or u) even though HARP does not require the failure rates to be constant. 1 he failure 
distribution is specified as either exponential or Weibull at run time. In genera , an arc 3e wcei 
2s ( T) l„d Ti - U) is labeled with the value i * A (if A is the failure rate of component 
t ‘ pc i). Likewise, an arc between states (ij) and (i,j - 1) is labeled w.th the value J.fCI I- 
is the failure rate of component type 2). 

Although mosl transitions are of the type previously described, transitions between arbitrary 
pairs of states witl, arbitrary labels are certainly permitted. However, the follow, ng restnc 
apply: 

• There can be only one level of parentheses. 

• Only addition and subtraction are allowed within the parentheses. 

• Only addition, subtraction, and multiplication are allowed outside the parentheses. 

• The rate expression cannot exceed 23 characters. 

• Other than the previously listed mathematical symbols, only alphabetic characters (upper 
or lower case) and numerals are understood by the HARP engine. 

4.4. Inputting a Fault Tree FORM 

4.4.1. Replicated Basic Events 

To reduce the size of a model, HARP allows statistically identical components to be combined 
into single basic events. A replicated basic event is labeled with an expression of the 
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Dictionary File 

Symbolic 

Component Logical Failure FHHM 
Type Name Rate File Name 

1 Processor Lambda 1 Fchm.car 

2 P2 Lambda2 None 

3 Memory Lambda3 None 

Figure 2(1 Repeal'd and distinct basic events in HARP. 


should not bo considered. A functional dependency gate has a single trigger input (either a 
basic event or the output of another gate in the tree), a normal output (reflecting t he status of 
the trigger event), and one or more dependent output ('vents. The dependent outputs are basic 
events that depend on the trigger event. When the trigger event occurs, the dependent, basic 
events are forced to occur. The occurrence of any dependent basic ('vents has no effect on the 
trigger event. 

For an example, consider the Cm* system (ref. 20) (shown in fig. 27), which consists of 
clusters of processors and memories connected by links. Each cluster consists of eight local switch 
interface controllers (S. local), each attached to one processor and one 12K-memory module. Each 
processor has 4K of memory on board. The K.map is a cluster controller connecting the S. locals; 
the clusters are connected by intercluster communications (L.inc). A fault in the K.map renders 
the S. locals (and their connected processors and memories) inaccessible, while a fault in the 
S. local makes the processors and memories connected to it inaccessible. 

The development of the fault tree model for the Cm* system (shown in fig. 28) is simplified 
by the use of the functional dependency gate. The dependence of the S. locals on the K.map 
can be captured by two functional dependency gates, each with a K.map trigger event and 
four S. local dependent events. Similarly, the dependence of the processors and memories on the 
S. locals is captured in eight functional dependency gates, each with an S. local as the trigger event 
and the associated processor and memory as the dependent events. The system is considered 
operational as long as three processors can communicate with three memories. As long as the 
L.inc is operational, the requirements can be satisfied by the components of both clusters (thus 
the 6/8 gates). If the L.inc fails, the requirements must be met within one cluster (thus the 
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m * n, representing m replications of redundant, functionally identical components of type n. 
Replication is useful when modeling statistically identical components with the same failure rate 
value, for example, three processors (fig. 15). Because HARP converts the fault tree to a Markov 
chain for solution, this combination of equivalent components reduces the size of the resulting 
Markov chain considerably. Suppose a fault tree has j basic events, each with a replication 
factor of fcj. If every component were required to fail before the system fails, then the resulting 
Markov chain using the multiple basic events would have + 1) ~ 1 + 3 states. If the basic 

events were all separate, then there would be £ k, basic events and the resulting Markov chain 
would have 2^ k i states. Consider such a system having 5 basic events, each with a replication 
factor of 3. The Markov chain resulting from the tree with replicated basic events would have 
1028 states, and the Markov chain resulting from the fault tree without replicated basic events 
would have 2 15 = 32 768 states. 

4.4.2. Representation of Shared Events 

The user should be aware of a source for potential confusion when constructing fault trees. 
The difficulty is only evident when the fault tree contains shared events because HARP uses a 
representation for shared events that differs from the one often found in the literature. A shared 
event is a basic event that is used more than once in the fault tree, that is, a basic event that 
affects the failure of the system in more than one way and thus has more than one parent gate or 
box. In the literature, such repeated events are sometimes depicted by multiple occurrences of 
its basic event node in the fault tree. However, HARP uses the convention that each basic event 
node represents a distinct basic event that is assigned a numeral by the user. If a single basic 
event is used in more than one place in the fault tree, then it should still be depicted by only one 
basic event node, that is, the same node numeral. This basic event node has multiple outgoing 
arcs, one to each parent node, to represent the fact that the event is a shared event. The G 
program (see vol. 3 of this TP) represents a shared event as a double circle. The shared basic 
event is initially drawn as a single circle. All other multiple occurring events associated with the 
initial basic event are referenced back to the initial single circle basic event. The double circle 
notation is provided for drawing convenience and to simplify the drawing by reducing connecting 

arcs. 

Conversely, two or more basic events with the same label but different node numerals represent 
two or more distinct basic events that happen to be the same component type. The fact that 
basic events have the same label does not make them a shared event, having the same node 
numeral does. 

In the fault tree labeled “Event Repeated” in figure 26, a single component, node 2 labeled 2 
(P2 in MODELNAME.DIC), appears as an input to two different gates (node 2 is shared). In 
the fault tree labeled “Event Not Repeated,” two individual components, nodes 2 and 3, are 
both labeled 2 (P2 in .DIC), each being an input to only one gate (not shared). In the latter 
case, the two individual components, nodes 2 and 3, are functionally different components within 
the fault tree, although they happen to have the same label and therefore are the same type o 
component. 

4.4.3. Example of a Functional Dependency Gate 

This section introduces a functional dependency gate that can be used to simplify the 
generation of a fault tree model of a system exhibiting structural dependencies of components. 
Suppose a system is configured such that the failure of some component (called a trigger 
component) causes other dependent components to become inaccessible or otherwise unusable. 
In this case, later failures of the dependent components will not further affect the system and 
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the basic events that are forced to occur by a trigger event do not count as component failures 
when determining the failure level of a state. If the first component failure is a trigger event 
that removed two additional components, then the resulting state has three component failures. 
However, this state is considered to be on the first failure level of the Markov chain (i.e., it 
is a member of the set of states that result from the covered failure of one component only). 
A coverage model is included on the arc representing the failure of the trigger event. Because 
no explicit arc represents the occurrence of the dependent basic events, no coverage model is 
included for these events. 

Although the functional dependency gate does not increase the modeling capacity of the fault 
tree, it can reduce the effort required to develop a fault tree model of a system with complex 
interconnections. 

4.4.4. Example of Priority and Gate 

The prioiity and gate is logically equivalent to an AND gate, with the added requirement 
that the input events occur in a specific order (refs. 54 and 55). In HARP, the number of inputs 
for a priority and gate is limited to two for implementations reasons. However, priority and gates 
can be cascaded together to achieve the effect of multiinput priority and gates. (See fig. 9.) As 
an example of the use of a priority and gate in a fault tree, consider a system that consists of two 
channels, as shown in figure 29. Each channel has two sensors, A1 and A2, that are connected 
by a device interface unit (DIU). One sensor is a primary channel, the other is an alternate. 
The system begins by operating in channel one. Upon the first failure affecting channel one, 
the system switches to channel two if channel two has not experienced any component failures. 
After switching to channel two, the system continues to use channel two until it fails, at which 
time the system fails. If after the first failure on channel one, the system does not switch to 
channel two, then it remains on channel one until channel one fails, at which time the system 
fails. 


Sensor A 1 


Sensor A 1 



CPU1 


Channel 1 


Sensor A2 


Sensor A2 



CPU2 


Channel 2 


Figure 29. Two-channel system. 

Figure 30 is a fault tree model of this system. The fault tree for this system utilizes two 
priority and gates, which input to two and gates. The leftmost priority and gate represents 
the situation where a failure occurs on channel one, causing a switch to channel two, and then 
channel two fails. The rightmost priority and gate represents the situation where something fails 
on channel two (and thus when a failure on channel one occurs the system stays on channel one) 
and then channel one fails subsequently. 

4.4.5. Example of Cold Spare Gate 

As an example of the use of the cold spare gate, consider a system consisting of six 
components: A , B , C, D , E , and F . The system operates as a triad, with A, B , and C active 
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Cluster K.map = Cluster controller 


Line = Interclusler communications 


Figure 27. Cm* system used as example. 




Figure 28. Fault tree model of Cm* system. 


2/4 gates). The outputs of the functional dependency gates need not be used as inputs to any 
other gates in this instance. 

The fault tree to Markov chain conversion uses the functional dependency gate to alter that 
state descriptor. In a state representing the occurrence of a trigger event, the state descriptor is 
changed such that all dependent events are recorded as having failed, if they have not already 
done so. No coverage model is considered for these dependent component failures to insure that 
the possibility of imperfectly covered failure of a component that is unusable or inaccessible 
cannot contribute to system failure. (The absence of a coverage model produces an optimistic 
result because any coverage value other than unity reduces system reliabi ity.) urt ermore, 


57 



































, md F ... . , \J\ dI f F IZ C f d * pareS (lnactlve but not subject to failure). Components D 

' J can substitute for A if A suffers a covered permanent failure. Component I) is the first 

swS C - t i PUt mt ° n tiV ° UH<> Whei ‘ A faiLs ‘ If D then «xpcricnces a covered failure, E is 
sw itched into active use. Components D and E cannot fail before they are switched into use 

Component h can substitute for either B or C, whichever fails first. Thus, F is a shared cold 
spare (also called a pooled spare). Note that if an external event called G causes a spare called D 
to fail, then component D is no longer available for the cold spare gate connected to it 

The fault tree model for this system appears in figure 31. The cold spares dependencies 

71a tl Y CS !abdCd " Col<1 Spar ° Gate ” Th( ' leftmost input to the cold spare 

gate is the primary component, and the others are the alternates (cold spares) for t he primary 

TZZT r u , m WhiC !\ the C ° ld Spare components input to the cold spare gL (left 
* g ) implie s the order in which the spares are switched into active use. The output of the 
co ( spare gate fires when the primary component and all its alternates have failed. For shared 

o° Us Enat “hT 77^ ^ 7T? C ° mp ° nent has failed > mid either its alternate fails 
ts alternate has already been switched into active use for another component (and thus is 

Ztes TustT b 7 7 an f! tern 7 f ° r thG primar y component). All inputs to cold spare 
gates must be basic events (possibly replicated). The input events of the cold spare gate are not 
allowed to have Weibull failure rates in HARP. (See section 2.9.1.) 

4.4.6. Example of Sequence-Enforcing Gate 

m F f lg, ,T! 32 Sh ° WS the USe of se 9 u encc-en forcing gates to model state dependent FEHM’s in 

uni vl n ° tat,0n - Thi ! l ault trCC models a ma J orit y voting 2 out of 3 system where perfect 

. . h coverage is assumed for the first failure and a user-specified FEHM is assigned to the 
second failure. The details of this model are discussed in chapter 7. 
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F 


Figure 31. Fault tree model of system with cold spares. 



P, Q (No FEHM's) 

2P1 , 2Q1 (FEHM s) 

Note: Dictionary names are shown here. 


State-Dependent FEHM's 


Figure 32. 


Fault tree model of system with a sequence-enforcing gate. 


4.5. Editing MODELNAME.INP File 


The MODELNAME.INP file is an output of harpeng. It is a text file containing the name 
of the matrix file and the type and values for the symbolic rates defined by the user at runtime 
(of the HARP engine). This file can be edited after the HARP engine program run is complete. 
This eliminates the need to enter the parameter values again during future runs of the same 

model. 

Because the old format of the MODELNAME.INP file was inconvenient for the user it is 
now written in tabular form, as shown in section 4.2. For a Weibull failure rate, the value of the 
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symbol refers to the lambda parameter and the variation is the alpha parameter. Also, symbol 
type 21 denotes that the symbol has a Weibull failure rate of type 1 and symbol type 22 denotes 
that the symbol has a Weibull failure rate of type 2. The HARP engine program still accepts 
any MODELNAME.INP file written in the old format and rewrites it in the new format. 


When the system being modeled is large, the symbolic matrix generated by the model can be 
too large to store in the data structures internal to HARP. Then, any calculations that require 
reevaluation of the symbolic matrix, for example, the computation of bounds, are not possible. 
In such cases, the HARP engine does not need the parameter variation values in the input. If 

the engine requires the variation value and it has not been specified, then it is assumed to be 
zero. 

The exact size of the model that causes the generated symbolic matrix to be too large for the 
HARP internal data structures cannot be determined because the size of the symbolic matrix 
for the model depends on the size and number of transitions in the model. 


4.6. Entering Dictionary in tdrive 


The dictionary is both a tool for the user and for the HARP program. As previously 
mentioned, the dictionary is required for fault tree input and for Markov chain input if coverage 
modeling is desired. Each component type that can fail in the system should be listed, that 
is, processors, sensors, and actuators. (For Markov chain FORM’S, do not enter component 
repair information in the dictionary.) For each component type, the failure rate symbol is given, 
that is, lambda, mu, rho, etc. The tdrive program then asks for the coverage model to be used 
(ESPN, ARIES, CARE, distributions, moments, empirical, values, none) for the component type. 
The user can then specify a preexisting file containing the appropriate parameter information 
or create the FEHM file by supplying a filename into which the model parameters should be 
stored. Once this information is given for each component type in the system, the user is asked 
about user-defined near-coincident faults (only if there are coverage models other than NONE 
or VALUES). For each appropriate component type, the user lists all component types that 
can affect the given component type in terms of a second fault crashing the system. Once the 
dictionary is complete, the FORM is entered. 

To model certain peculiar features of the system under study, the modeler can alter the 
dictionary manually. However, care must be taken in making any changes to the dictionary. For 
example, the length of the rate parameters and component names cannot exceed 12 characters. 
The user must ensure that the number of component types in the dictionary equals the number 
of failure exhaustion states in the MODELNAME.INT file. The grammar associated with the 
MODELNAME.DIC file is restrictive; therefore, while making changes, the user should not 
delete the blanks at the end of each line. If the interfering component type numbers exceed 
more than one line in the dictionary, they should be continued on the next line starting from 
the first position (i.e., no blanks at the beginning of the line). 

4.7. State-Dependent FEHM — Overriding Default Model 

Overriding the FEHM associated with a component type on a specific failure transition is 
possible. To do so, a colon is inserted into the transition label before the semicolon, followed by 
the name of the new FEHM file or by the word NONE (signifying no FEHM for this transition). 
For instance, if the transition label is 4*GAMMA ; , we can change the label of the Markov state 
transition to 4*GAMMA: FEHM. NEW;. This change does not affect any other transitions triggered 
by a failure of the specific component type. If we want, we can instead turn off the FEHM for 

this transition by using the keyword NONE. For this example, the state transition label then 
becomes 4*GAMMA : NONE ; . 
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In some cases, state dependent FEHM’s can be described at the fault tree level with the 
sequence gate. (See section 2.5 and chapter 7.) As an example, consider a three-processor system 
where each processor has a failure rate of A. When all three processors are operational, the 
FEHM associated with the transition is say, FEHM.l. After the first fault (i.e two operational 
processors), the FEHM used is FEHM. 2. The system fails when two out of ■ three _P r ° c< ^ r j 
have failed. This model can be described by a sequence gate with two inputs A and B, where A 
occurs before B. Input A has associated failure rate of Aj , which is assigned the numerical v 
of 3 * A at run time. Input B has associated failure rate of A ? , which is assigned the _numen 
value of 2 * A at run time. In the dictionary file, FEHM.l is assigned to A, and FEHM. 2 is 

assigned to A 2 - 

If for a particular component type the user has specified type VALUES i^e dictionary 
(rather than a FEHM model), has chosen to ignore near-coincident faults (NONE), or has 
chosen to ignore coverage completely, then the default model (or lack of a model) cannot be 
overridden This restriction exists because each of these choices results in state-independent 
coverage values, which cannot be later made state dependent. Likewise, a transition cannot be 
overridden by typing VALUES because of its state independence. 

To make it easier for the user to decipher the state of individual components for a particular 
Markov chain state, the MODELNAME.INT file can be optionally augmented by comment lines. 
If the user responds affirmatively to the tdmve question “Include state tuples as comments i 
the INT file?” then each line (arc designation) in the MODELNAME.INT file is preceded by 
a comment line (beginning with a in the first column). This comment line shows the state 
descriptor for the source and destination states for the arc. For example suppose there is an arc 
from some state 41 to some other state 56 in the MODELNAME.INT file. Suppose further that 
state 41 represents a configuration with three components of type 1, two components of type 2, 
and zero components of type 3, and the arc represents a faihire of < type ^ Then the 
entry in the MODELNAME.INT file is: 41 56 2*LAMBDA2 ; . If the MODELNAME.INT file 
commented, then the line preceding this line is: * 3 2 0 -> 3 1 0 2*LAMBDA2 ; 
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Chapter 5 


Technical Information 

5.1. Error and Warning Messages 


An electronic file called MESSAGES.TXT is included on the tape with the source code; this 
hie explains the meaning of each error or warning message from HARP. Within each program 
error messages are numbered beginning with 500; warning message numbers are less than 500.’ 
o c iscern the meaning of an error or warning message, simply search the MESSAGES TXT 
hie for the corresponding message number. The text lists the subroutine name and source file 

from which the message originated, an explanation of the message, and a course of action (where 
possible} lor correcting the error. 


5,2* Installation of HARP Program 


nui ? a which 1S received on magnetic media has six directories: TDRIVE, 

FI FACE, HARPENG TESTDIR, EXECUTE, FSOLVER. For a DEC VMS installation, the 
user as two options for compiling and linking: using command files or using MMS (Module 
Management System) files. In the TDRIVE, FIFACE, and HARPENG directories contain source 
files and a FORTIT.COM file and a LINKIT.COM file. The former creates the needed object 
in | K * U f ^ s> ancl the latter creates the executable. The user can invoke the *.com files by typing 
a ' IT.COM ©LINKIT.COM in that order for each subdirectory. The MMS file is 
entitled DESCRIP.MMS and also appears in each directory along with the *.COM files. The 
user can invoke MMS by typing inrns in each subdirectory.' For a UNIX installation, Makefiles 
are included in each directory. The user can invoke the Makefiles by tvping make in each 
subdirectory. The executables are entitled TDRIVE for the driver portion of the code. FIFACE 
for the interface and HARPENG for the engine. Once compiled, the user can move these 
executable files to a new location. Generally, we operate with these files in the EXECUTE 
directory and put this directory in our path. The code is configured to model systems with at 
most 10 000 states and up to 90000 transitions (excluding diagonals of the matrix as HARP 
automatically calculates the diagonals). These limits can be changed using the information 
provided in the next section. 


Once installed, the version can be tested against the three examples in directory TESTDIR 
In addition, scripts of actual runs are found in the EXECUTE directory, named SCRIPT FT 
and SCRIPT.MC. These files create 3P2M1BFT and 3P2M1BMC, respectively. The output files 
of these runs are also in directory EXECUTE. 


pi ;, director y FSOLVER contains the source code for the CFEHM program, an editor for 
FEHM s. Use the FORTIT.COM and LINKIT.COM files or DESCRIP.MMS listed therein to 
create the DEC VMS executable. Makefiles can be used for UNIX installation. Section 5.4 
contains information on the CFEHM program. The accompanying tutorial will help familiarize 
the user with running the HARP program. 
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5.3. Changing Limit Sizes for HARP Program 


The HARP program as received is configured for up to 10000 states and up to 90000 tran- 
sitions - In addition, the HARP engine program has a limit of 15 000 symbols m the model. 
These parameters can all be changed as described in the following sections. 


5.3.1. Program tdrive 

To change the number of states in program tdrive , perform the following steps: 

1. In the ft2mc.for source file, locate the following line (occurs only once) with an editor 


PARAMETER (MSTATS = oldval) 

where “oldval” is an integer and represents the maximum number of states TDRIVE 
is currently set to handle. Change “oldval” to the new value “newval desired lor the 

number of states. 

2. Recompile and relink. 

Internally, tdrive uses linked lists implemented by routines that allocate _ and ^manage indi- 
vidual regions of large integer arrays. Two such arrays are defined m the ™ ^ 

in file ft2mc.f. The array POOL() has its length defined by parameter PLEN, similarly y 
DPOOLO has its length defined by parameter DPLEN. If a HARP error message indicates an op- 
CTatkm^failed 8 becau^tfof insufficient memory, increasing PLEN and/or DPLEN and recompiling 
the program may prove sufficient. 


5.3.2. Program fiface 

To change program fiface so that it can ran larger problems, the following variaH« m^e 
changed. The program fiface has two state and transition sizes- -those for SORTED input < 
those for UNSORTED (or with symbolic state names) input. If input is in sorted or er (a au 
tree converted to a Markov chain from TDRIVE is always in sorted order) then the state size 
can be up to 10000 and transitions size up to 90000. On the other hand, if the mpu, is no m 
row-wise order or if the state names are symbolic, then the limits are 500 for state size and 2050 
for transition size. If your model is UNSORTED and does not fit in the data structures, first try 
to put the MODELNAME.INT file in row-wise order with state names having uicreasing integer 
£££ beginning with 1. (This scheme is more efficient and easier than altering the^od.) If the 
state size and transition size are still too small, increase the following sizes foi SORTED inp . 

• NODES: in common block DATACB 


The size of this array represents 
DATACB contains the transitions 


the number of TRANSITIONS in a SORTED model, 
of the SORTED model. Files with this common block 


are 


covs, fiface, Id, nxt , printit, transpose 


22 By using the HARP truncation option, considerably greater system models can be solved. System mod* with 1 b^c 
events can be solved with a truncation at level 3 by HARP on a VAX or Sun workstat.on. A truncation level 2 or 1 allows 
the solution of even larger system models (more than 71 basic events) with possible decrease in accuracy. Larger sy. ^ 
can be solved on larger computing platforms, up to 2 s * states (96 basic events) with ^wopnate truno^. A S >' Stem 
this size, however, requires expanding the default state size of 10000 to well over 60000. (See section 2.8.) 
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• ROWPTR: in common block DATACB 

The size of this array equals the number of STATES+1 in a SORTED model. ROW- 
PTR is an array of matrix row offsets, for each state in the model it tells how many 
transitions emanate from it. Files with this common block are 

covs, f if ace, Id, nxt , printit, transpose 

• PARMS: in common block CHRDAT 

The size of this array equals the number of TRANSITIONS in a SORTED model. PARMS 
contains the rate parameters of the SORTED model. Files with this common block are 

covs. Id, nxt, printit, transpose 

• JAT: in common block TRARRY 

The size of this array equals the number of TRANSITIONS in a SORTED model. Gen- 
erally, JAT is the row pointer array. Files with this common block are 


Id, printit, transpose 

• AT: in common block TRARRY 

The size of this array equals the number of TRANSITIONS in a SORTED model. AT 

points to the character rate parameters for each transition in the model. Files with this 
common block are 

Id, printit, transpose 

• IAT: in common block TRARRY 


The size of this array equals the number of STATES+1 in a SORTED model. Gener- 
ally, IAT is the column pointer array. Files with this common block are 

Id, printit, transpose 

• SYMBOL: in common block COMSYM 

The size of this array equals the number of STATES allowed in an UNSORTED model. 
SYMBOL contains the symbolic name for each state in the UNSORTED model. Files 
with this common block are 

Id, nxt 

• MCPARM: in common block COMSYM 

The size of this array equals the number of TRANSITIONS allowed in an UNSORTED 
model. MCPARM contains the rate parameters for each transition in the UNSORTED 
model. Files with this common block are 

Id, nxt 
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• MCNODE: in common block DATACA 


The size of this array represents the number of TRANSITIONS allowed in an UNSORTED 
model. MCNODE contains the transitions of the UNSORTED model. Files with this com- 

mon block are 


Id 

In addition, in routine INITSZ of fiface.for, the following four limits must be changed. Note: 

It is unnecessary to change each occurrence of each variable in the file; instead, change t le 

declaration. 

• STSIZ— new number of states for sorted input 

• TRSIZ — new number of transitions for sorted input 

• MCSTZ — new number of states for unsorted input 

• MCTRZ — new number of transitions for unsorted input 

We also have the following file names and variables that must be changed. 

• covs.for: NODES, ROWPTR, FARMS 

• fiface.for: NODES, ROWPTR, STSIZ, TRSIZ, MCSTZ, MCTRZ 

. Id. for: NODES, ROWPTR, PARMS, JAT, AT, I AT, SYMBOL, MCPARM, MCNODE 

• nxt.for: NODES, ROWPTR, PARMS, SYMBOL, MCPARM 

• printit.for: NODES, ROWPTR, PARMS, JAT, AT, IAT 

• transpose. for: NODES, ROWPTR, PARMS, JAT, AT, IAT 

5.3.3. Program harpeng 

To increase the number of states in the HARPENG program, the user must change the 
following parameters and array dimensions: 

• MAXST— the number of states (originally set to 10000) 

• TRANS — the number of transitions (originally set to 90000) 

(We generally assume an average of nine transitions per state, but this number can be 

lower or higher.) 

• MAXSYM — the number of symbols (originally set to 15000) Estimate the number of 
symbols in the model. A good estimate is 100 + total number of FEHM instances (i.e 
C C- 2 , ..., C N in the model means N FEHM instances) + .‘Unumber of coverage symbols 
of type' VALUES + number of failure states (number of component types +2) + number 
of active states whose probabilities the user wants to see (only applies for unsorted Markov 
chain input— see section 2.7.1). The number of symbols can be reduced tremendously by 
telling program FIFACE that the user is not interested in near-coincident faults. If so, 
the coverage values is state independent, and the number of coverage symbols is reduced 
from (total number of FEHM instances) to (number of components with FEHM models). 

• MAXFAC— the number of factors (originally set to 15000) 

• MAXTRM the number of terms (originally set to 7500) 
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• DIACJ: in common block MATRXN 


The size of this array equals MAXST. DIAG contains the value of each diagonal en- 
P ( whlch 1S the negative sum 23 of the outgoing arcs for the state; that is, DIAG(2,2) is 
the negative sum of all arcs leaving state number 2). Files with this common block are 

bounds, eval, fill, gcall, harpeng, set, store, sym 

• ENTVAL: in common block MATRXN 

The size of this array equals TRANS. ENTVAL contains the value of each off diago- 
nal entry of the transition rate matrix. Files with this common block are 

bounds, eval, fill, gcall, harpeng, set, store, sym 

• SYMENT: in common block MATRXN 


Tj® ®“® of this arra y e( l uals MAXFAC. Along with with FACTYP and NXTFAC, 
SYMENT is a vector of symbolic entries. R either contains a pointer to the symbol 
table or a constant (integer, float, or double). Files with this common block are ' 

bounds, eval, fill, gcall, harpeng, set, store, sym 

• SYMVAL: in common block MATRXN 

The size of this array equals MAXSYM. SYMVAL contains the variation value of each 
symbol in the model. Files with this common block are 

bounds, eval, fill, gcall, harpeng, set, store, sym 

• SYMVAR: in common block MATRXN 

The size of this array equals MAXSYM. SYMVAR contains the variation (if any) of 
each symbol in the model. For Weibull failure rates, SYMVAR is the vector of the alpha 
parameter value. Files with this common block are 


bounds, eval, fill, gcall, harpeng, set, store, sym 
• SYMNOM: in common block MATRXN 


The size of this array equals MAXSYM. SYMNOM contains the nominal 
symbol in the model. Files with this common block are 


value of each 


bounds, eval, fill, gcall, harpeng, set, store, sym 
• ROWPTR: in common block COMM1 


The size of this array equals MAXST+F ROWPR contains row pointers into the sparse 
matrix data structure. The difference between ROWPTR(i) and ROWPTR(i-f-l ) is the 
number of nonzero entries stored for the corresponding row. Files with this common block 


bounds, eval, fill, gcall, harpeng, set 

23 The negative of the sum of the absolute values of the outgoing transition rates. 
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• COLIND: in common block COMM1 


The size of this array equals TRANS. COLIND contains pointers to the sparse matrix 
columns. Files with this common block are 


bounds, eval, fill, gcall, harpeng, set 

• FACLHD: in common block COMM2 

The size of this array equals MAXTRM. Together with NXTTRM, FACLHD constitutes 
a term node. The term node stores pointers to symbolic express, ons^ FACLHD points to 
the head of a factor list containing the symbolic expression (see FACTYP, bYMFN i , 
NXTTRM). NXTTRM points to the next term in the expression (terms are separated by 
a plus or minus). Files with this common block are 

eval, fill, gcall, get, harpeng, hrputil, set, store, sym 

• NXTTRM: in common block COMM2 

This size of this array equals MAXTRM. Together with FACLHD, NXTTRM constitutes 
a term node. The term node stores pointers to syrnbohc expressions. FACLTO o 

the head of a factor list containing the symbolic expression (see FAC1 Yr, bYMtlN I, or 
NXTTRM). NXTTRM points to the next term in the expression (terms are separated by 
a plus or minus). Files with this common block are 

eval, fill, gcall, get, harpeng, hrputil, set, store, sym 

• FACTYP: in common block COMM2 

The size of this array equals MAXFAC. Combined with NXTFAC and SYMENT 
FACTYP is a vector of symbolic entries. It contains an integer specifying the type o 
factor pointed to by FACLHD: 0 = Constant, 1 = x, 2 = -x, 3 = 1 and 4 = \ Files with 
this common block are 

eval, fill, gcall, get, harpeng, hrputil, set, store, sym 

• NXTFAC: in common block COMM2 

The size of this array equals MAXFAC. Combined with FACTYP and SYMENT, 
NXTFAC is a vector of symbolic entries. It contains a pointer to the next actor m 
the term. Files with this common block are 

eval, fill, gcall, get, harpeng, hrputil, set, store, sym 

• ETLHD: in common block COMM3 

The size of this array is between 2 and COLIND, that is, 2 < ETLHD ^ COLIND. 
ETLHD determines how much of the matrix can be read in at one time. If the size ot t e 
array is small, the evaluation takes longer; however, the size of the program is smaller. It 
the size of the array is small enough to allow only a portion of the matrix to be read in at 
a time, bounds and Weibull failure processes are disallowed. The size of this array should 
be stored in MAXENT. Files with this common block are 

eval, fill, gcall, get, harpeng, hrputil, set, store, sym 
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• SYMNAM: in common block MATRXC 

Tl„. si*. of this army equals MAXSYM. SYMNAM cm, tains the name „f each symbol 
m the model. Files with this common block are 

bounds, eval, fill, gcall, get, harpeng, hrputil, sym 

• SYMFN: in common block MATRXC 

Th<. sis,, of this array equals MAXSYM. SYMFN , -obtains the file -name of ouch 
symbol m the model, ,f the symbol is a coverage farter. Film with this common I, lock arc 

bounds, eval, fill, gcall, get, harpeng, hrputil, sym 

• STDEF 

The dimension of STDEF should be changed to MAXST in bounds .for, set. for. and 
gCH.lL roi 

• WORK 

The dimension of WORK should he changed to (8*MAXST+3) in gerk.for and gcall.for 
Also, in routine INITMX of hrputll.for, the following five limits must be changed: " 

• MAXST new numl)er of states 

• MAXTRM new number of terms 

• MAXFAC new nmntx'r of factors 

• MAXSYM new number of symbols 

• MAXKNT new number of size of ETLHI) array 

We also have the following file names and variables that must be changed. Note- It is 
unnecessary to change each occurrence of each variable in the file: instead, change the declaration. 

• bounds. for: DIAGS. ENTVAL, SYMKNT, SYMVAL. SYM VAR. SYMNOM. ROWETR 

COLIND. SYMNAM. SYMFN. STDEF 

• eval. for: DIAGS. ENTVAL. SYMKNT, SYMVAL. SYM VAR. SYMNOM. ROWPTR. 

COLIND. IACLIID. NXTTRM. FACTYP, NXTFAC’. ETLHD SYMNAM 
SYMFN ' ‘ ‘ 

• fill. for: DIAGS ENTVAL, SYMKNT, SYMVAL, SYM VAR. SYMNOM. ROWPTR. 

COLIND. hAOLHD, NXTTRM. FACTYP. NXTFAC. FTLIID. SYMNAM 
SYMFN 

• gcall.for: DIAGS, ENTVAL, SYMKNT, SYMVAL, SYMVAR. SYMNOM ROWPTR 

COLIND, SYMNAM, SYMFN, STDEF, WORK 

• gerk.for: WORK 

• get. for: FACLHD, NXTTRM, FACTYP, NXTFAC, ETLHD. SYMNAM, SYMFN 

• harpeng. for: DIAGS, ENTVAL, SYMENT, SYMVAL. SYMVAR. SYMNOM ROWPTR 

COLIND, SYMNAM, SYMFN, WORK 

. hrputil.for: FACLHD, NXTTRM, FACTYP, NXTFAC, ETLHD, SYMNAM, SYMFN 
MAXST, MAXTRM, MAXFAC, MAXSYM. MAXENT 
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. set for DIAGS, ENTVAL, SYMENT. SYMVAL. SYMVAR. SYMNOM. ROWPTR, 

' COLIND, FACLHD. NXTTRM, FACTYP. NXT1AC. ETLHD. STDEF 

. storc.for: DIAGS. ENTVAL. SYMENT. SYMVAL, SYMVAR. SYMNOM. FACLHD. 
NXTTRM. FACTYP. NXTFAC, ETLHD 

. svtn for: DIAGS. ENTVAL, SYMENT. SYMVAL, SYMVAR. SYMNOM. FACLHD. 

'• ’ NXTTRM, FACTYP. NXTFAC, ETLHD, SYMNAM. SYMFN 

If the symbolic trimsUioi, rate matrix fits i„ t hese .lata structures then ho, mils .ml Weil,, ,11 
failure processes are allow, si by the program. If the matrix does not 111. the,, the polliol, ol 111, 
matrix 'that is read to at each stop is evaluated and the space is reus, si. Once the entire matrix 
is evaluated, the unreliability (or unavailability) is computed. 

5.4. CFEHM — An Editor for FEHM Models 

The stand-alone program CFEHM allows the user to create new FEHM Hies or change 
parameter values in existing ones. A user creating a new FEHM tile is stepped through the 
input as in tdrive, with the same models available, as described in ^x l ion - . • » •« ( 1 1011 * 

creating the file, the FEHM is solve.! immediately and the exit probabilities are displayed on 
the terminal. CFEHM allows the user to change any existing FEHM files and supports change 
of single parameters, adding phases as in the ARIES Transient Recovery Mode or c langing 
distributions as in the ESPN model. The FEHM file is displayed line by fine tollowed by 
(Y/N) option, where yes means that the user wants to change a value and no means hat 
user wants to retain the old value. If the change option is chosen, the user is prompted to. Hi. 
now input, and subsequent dependent values. 

5.5. Solving Large Models 

This version of HARP is configured for a problem that requires up to 10 000 states.-' A 
problem of this size can be solved easily on a DEC VAX 750 computer, winch » the machine 

which HARP wns , level,, „«1. The limiting facto, „„ the size of the prob he a, no " > ' 
storage required to store the symbolic transit, on rate matrix. On a DEC VAX , n,m . 

we have solvixl problems as large as 25000 state (one user reported success ,,, solving a sen I 
With 45 000 states on a DEC VAX 11-785). This section discusses some methods that thi 
can utilize to solve very large problems. 

If the symbolic matrix is too large t o store in the data structures internal to H ARP (but there 
are still fewer than 10000 states), the portion of the matrix that, has been stoic, is e\a ua 
and the space is reused. This method allows the user to solve larger problems (without met easing 
the data storage requirements) but disallows any calculations that require reevah.aHon of the 
svmbolie matrix (bounds. We, I, nil failure rates). If U,c problem ,s s„ll too large or ARP 
to solve, for example if there are too many symbols, the user can ignore the consideration ol 
near-coincident faults. This action significantly reduces tile number ol distinct, symbols 
model because the coverage factors are now state independent. 

To solve models that are larger than 10000 states, see the section 4.3. which discusses how 
to change the limits on the various variables used in HARP. 

5.6. System Resources 

When HARP is executed on a DEC VAX 11-700 series computer, the following resources are 
required for the default 10 000 state limits: 

■M Xht , configured state size of 10000 is the actual number of state occupancy probabilities that are computed By using 


o!>« 


t rnnr-it inn techniaue ot chapters l ana 4, systems »uu mux.. — 

equivalent Markovian states; however, the computational resources required to use this size model may be uuavai a . e. 
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• 4096 physical pages (2 MB of real memory) 

• 40000 virtual pages (20 MB virtual address space) 

V" d f “ environment, the same bytes of real memory and virtual address space are 

q .The system parameters may also need to be changed to allow a single user a large 
working set size. These parameters include the working set maximum and the virtual page 
count. For execution under MS DOS, see section 1 2 4 
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Chapter 6 

Dynamic Fault Tree Gates 

6.1. Modeling Nonrepairable Systems of Arbitrary Complexity 
With Fault Trees 

The dynamic dependency gates for fault tree modeling are not traditional fault tr *f 
and will be unfamiliar to most HARP users. This chapter familiarizes the user with the detailed 
specification and mathematical model for each dynamic dependency gate. The ongina researc 
on the dynamic fault tree gate models is reported in reference 36. As with any modeling language, 
the modeler must properly apply these gates according to their specifications. 

For the purposes of our discussion we use the following notation. We identify a particular 
Markov chain state by listing a set of the components that failed while producing the state in 
question. For example, if the system has five different components (one of eac ype pre ) 
and if components 2 and 5 have failed (first component 2 followed by component 5) then the 
system is left in a state denoted by an index 1 = (2,5). Because the new fault tree gates ‘mode 
system behavior for which the sequence in which the components 6ul is important the order 
in which the components fail can be significant. Therefore, we note that state (2,5) gen y 
cannot be equivalent to state (5,2). 

States can also be denoted with tuples, indicating which components are still working and 
which have failed. We call these component status tuples because they denote the working status 
of all system components. It is not possible to denote the sequence in w ^ich ^^components 
have failed with only the component status tuple notation. For example, both state (2, 5) and 
state (5,2) can be denoted by the tuple 10110. For this reason, if the sequence of component 
failures is significant, then the Markov chain state labeling method may need to be extend y 
appending to the component status tuple some additional information indicating the sequence 
in which certain events took place. The form of this additional information ,s determined by the 
structure of the fault tree. For example, an additional element may be added to the component 
status region of the tuple for each priority and gate (see section 6.1.3) in the fault tree to indicate 
which of the gate’s inputs (left or right) fired first (or whether any inputs have fired at all yet)^ 
In this way, states can be denoted that are identical in terms of components working or failed 
but which are distinguished by the sequence in which component failures occurred, as expressed 
by one or more priority and gates. Yet another type of additional information (described in 
section 6.1.4) is appended to the component status region of the state tuple for each cold spare 
gate in the fault tree that shares any of its spares with one or more other cold spare gates in e 

fault tree. 

Components of identical type that serve a redundant function in the system can be g r ° u Pe d 
together in what are called replicated basic events. The previous notation is easily extende o 
accommodate these: the tuple notation is nonbinary with redundant sets of components denoted 
by members of the tuple whose value is greater than one. Conversely, the component failure fast 
notation can simply contain multiple occurrences of component type i in the list denoting t e 
failure of more than one member of the redundant set of components of type i. For examp e, 
if the system containing five types of components has a group of two redundant components o 
type 2, the original state tuple (denoting “all components working”) is 12111. If one component 
of type 2 fails, followed by one component of type 5 failing, followed by the other component of 
type 2 failing, the resulting state is denoted by either the component failure list (2, 5, 2) or t e 
state tuple 10110. If the sequence in which these components fail is significant (for example, it 
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needs ( toh; 2) * ^fT st ^ 2 ^ in the M ^kov chain), then additional information 
eeds to be appended to the tuple 10110 to indicate the sequence in which the components failed. 

n general, the form of that additional information is determined by the structure of the fault 
tree, as previously described. In most cases, the component failure list notation are preferred 
over that of the component status tuple notation for reasons of simplicity and clarity. 

We assume that component i fails at a constant rate A, A state labeled Fj is entered 
whenever a component of type j fails and there are no spare components to take its place 
(exhaustion of redundancy); thus, a system crash results. We denote the probability of being in 
state / at time t by P/(f), and the Laplace transform for the state by L,(s). For example the 

sZetthf tr r f0mi f ° r (2 ’ 5) are P ^ {t) and respectively. The initial 

and L,,(f)) 0ne 111 WhlCH ^ COmp ° nCnts are operational) is denoted by the index 0 (e.g.. P 0 (t) 

6.1.1. Functional Dependency Gate 

The functional dependency gate is the simplest of the sequence-dependent gates to define It 
has an mput, called the trigger input, which can be any general event (e.g., the output of any 
or fault tree gate or any basic event). For ease of presentation, we assume without loss of 
generality that the trigger event is a single unreplicated basic event. The gate also has a number 
of dependent events which must be (possibly replicated) basic events. Finally, the gate has an 
output (which we call the nondependent output) whose value is always identical to the value 
of the input event (i.e., the nondependent output event occurs if and only if the input event 
occurs). This output is provided to simplify the display of complex fault trees where the trigger 
event is required as an input to another gate. ' 

Figure 33 depicts the functional dependency gate as described here. The trigger event is 
event , the dependent events are events 3 through n, which can be replicated (i.e., groups of rn 
redundant components of type i), and the output event is denoted by the outgoing arc at the 
top of the gate. Figure 34 depicts the Markov model that defines the behavior of the functional 
dependency gate shown in figure 33, where the states have been labeled with failed component 
hsts. Figure 35 depicts the same Markov model with the states labeled with component status 
tuples. Figure 3 o gives a better indication of the component failures than figure 34. The Markov 

m °i d ?I. CO n ta ! nS tW ° StatCS: the ' nitial State de P ictin g the situation before the trigger event occurs 
and the final state representing the situation after the trigger event occurs. 


Nondependent Output 



Figure 33. Functional dependency gate. 
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Figure 34. Markov model defining action of functional dependency gate with failed component lists 




Figure 35. 


Markov model defining action of functional dependency gate with component status tuples. 


In figure 35, the initial state tuple indicates that neither the trigger event nor any of the 
dependent events (interpreted here as component failures) has ^ 

trigger component and all dependent components are still operational (all tuple membe 
stm greater than 0). The final state indicates that the trigger event has occurred (trigger 
component has failed), and the action of the gate causes all dependent events to occur as well 
I dependent components are forced to fail). Thus, all members of the tuple correspond to 
dependent events becoming zero (note that the tuple member corresponding ^ comport 2 
is still nonzero because component 2 was not dependent on the trigger component) The rate 
at which this occurs is the rate of occurrence of the trigger event Ai- The output event of 
the functional dependency gate is defined to be equal to the trigger event. The Chapin 
Kolmogorov equations and the single-sided Laplace transform equations are given for state 0 
equations (1) and for state 1 in equations (2) as follows: 


dP Q (t) 
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+ Ai 


= AxPolf) 


dt 
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(2) 


The probability of the output event of the gate is as follows: 


Pl(t)= I - e 


-A,t 


( 3 ) 


Although this example is simple enough to be solved by inspection, we have used a three-step 
analysis to obtain a mathematical expression for the gate’s output event (and hence a definition 
of the action of the gate) to illustrate the general procedure used to analyze all these ga . 


74 


We first identify a minimal Markov model that defines the action of the gate 
Laplace transform equation for the output event of the gate. We finally obtain 
the probability of the output event by inverting the Laplace transform. 


We then use the 
an expression for 


6.1.2. Sequence-Enforcing Gate 


We next consider the sequence-enforcing gate. This gate can have any number of inputs 
The leftmost input can be an, general event (c.g„ the output of any other fault tree gale or 
any basic event). Again, for ease of presentation, we assume without loss of generality that this 
leftmost event is a single unreplicated basic event. All other inputs to the gate must be (possibly 
replicated) basic events. The gate has an output which is on when all gate inputs are on (i e' 
ave occurred) We note that when an input leads to a dcscendent node that is a replicated basic 
event, the input event is not considered to occur until all redundant components of the replicated 
>asic event have failed. Figure 36 shows a sequence-enforcing gate for which the inputs all lead 
to unrepheated basic events (representing components 1 through n). The sequence-enforcing 
gate constrains the occurrence of its input events to follow the left-to-right order in which they 
appear as inputs to the gate. For example, event 2 is not permitted to occur before event 1 
imilarly, event t + 1 is not permitted to occur before event i. This process is accomplished by 
.not including in the Markov model any states for which event i f 1 has occurred and event * has 
not. The resulting Markov model that corresponds to figure 36 is shown in figure 37. 


Gate Output 



Figure 36. Sequence-enforcing gate with unreplicated basic events. 



Ffguro 37. Markov model defining action of n-input sequence-enforcing gate for output event of gate. 


follow!" 8 thC Chapman ' Kolrno S° rov equations, we can derive the Laplace transform equation as 


L 0 (s) = 


1 

s + Aj 


( 4 ) 
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at 


L\(s) = 
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= ^1 
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(7) 


Equation (7) is the Laplace transform equation for the output event of the sequence-enforcing 
gate. Using the following partial fraction expansion method from reference 44: 


N{ f) 

n?=i(*+“<) 


= £ 

i=l 


5 + a; 


where 


C< = 


N(s) 


-(s -I- ai)| s =-aj 


IK.i(* + °i) 

and taking A 0 = 0, we can obtain a form of equation (7) that is easily invertible: 

r m tta v 1/fr nj=°jYfc(*fc ~ 

L F n {s)- 11 A t 2_^ s + x k 

i=l fc=0 


( 8 ) 


Inverting equation (8) gives the probability of the output event of the sequence-enforcing 


gate: 


n n e~ Xkt 

PF ” ( ‘ )= n^ 0 rra^7) 


(9) 


These figures and equations are easily modified to accommodate inputs leading to replicated 
basic events Figure 38 depicts such a sequence-enforcing gate. The corresponding Markov chain 
is similar to that depicted in figure 37 except that mi failures of the components of type 1 occur 
first followed by m 2 failures of the components of type 2, and so on until the m n componen s o 
type n are the last to fail. A straight-forward application of the Chapman- Kolmogorov equations 
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shows that the Laplace transform for the 
basic events generalizes from equation (7) 


gate output when the gate inputs lead to replicated 
to the following: 



(mi -j)\j 
s+(mi -j)\i 


(10) 


from which an expression for the probability P F (t) 
fraction expansion procedure that was used before. 


can be obtained by using the same partial 


Gate Output 



Figure 38. An n-input sequence-enforcing gate with replicated basic events. 


6.1.3. Priority And Gate 

The sequence-enforcing gate described in the previous section forces component failures to 
follow a prescribed sequence in the Markov model by disallowing inclusion of any states in 
e model for which component failures have occurred in other than the prescribed sequence, 
n con ras , e priority and gate allows these states from the Markov model. However if 
components fail in other than the prescribed sequence for a particular gate, the gate never fires 
(i_e the output event of the gate never occurs). The output event only occurs if all input events 
of the gate occur in the left-to-right sequence in which they appear as inputs to the gate. Our 
implementation requires that each priority and gate has a maximum of two inputs However 
two or more priority and gates can be cascaded together to achieve the effect of a multiinput 
gate. Therefore, we assume a multiinput priority and gate (an unlimited number of inputs) for 
re purpose of our analysis. The inputs of the priority and gate can be any general event (e g 
be output of any other fault tree gate or any basic event). As previously noted, when an input 

nmt In haS ' C 6Vent ’ that in P ut is not considered to be on (i.e., the event occurred) 

all redundant components in the replicated basic event have failed. Again for ease of 

presentation we assume without loss of generality that the inputs to the priority and gate' lead 
o unreplicated basic events. Figure 39 depicts the Markov model that defines the action of a 
multi-input priority and gate for which the leftmost input is event 1, the next leftmost is event 2 
up to the rightmost event, which is event n. The Laplace transform equation for the output 
veil o ie ga e can be obtained by again using the Chapman- Kolmogorov equations as follows: 


L o(s) = 


s + EF=i 


(ii) 
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^- } = AiPoW-EWM*) 

dt kjt l 


Li(^) = 


1 


= A 


s + Sjfc/l A fc 

1 


AiL 0 (s) 


1 s + A fc s + S?=l A * J 


( 12 ) 


= Vi ,-!«)- (E M1 „^) P > 

1,1 i(s)= rrsiaZvJ A * il ''" , ‘‘ lW 


. + E*(i + E*<i t -i> a 7 '■'”» + Et. ^ J 


...Ai 


(13) 


*«>»(«) 


dt 


— ^n-Pl v ..,n— 1 W 


sL/? n (s)— A n Li v . M n— 1 ( s ) 


L F n ( S ) = 7^ 


S An s + X)j£(l,...,n-1) A j s + Z)i=lA 

1 n 

n 


5 “,1 s + S^(l l ... f »-l)^ 


n n 

n^n 


(14) 


Taking a 0 = 0 and a, E^(l,...,i) A j’ we obtain a f ° rm of ef l uation ( 14 ) that is easily invertible: 

1/nj^fcK ~ a j) 


^w=II a *E 

i= 1 fc=0 


s + a fc 

Inverting equation (15) gives the probability of the output even of the priority and gate- 

e -ht 


(15) 


Prm(,) "Po Ai §n,-o^K-«,) 


(16) 


We note here that equations (15) and (16) are identical to the expressions derived by Fussell 
(ref. 54) except that our method of numbering the input events of the gate is different. 

6.1.4. Cold Spare Gate 

The cold spare gate is the most complex gate in the set of sequence dependency gates. All 
inputs to the cold spare gate must be (possibly replicated) basic events. The leftmost input 
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Figure 39. Markov model defining action of n - input priority and gate. 


represents one or more primary units that are initially on-line. All inputs to the right of the 
leftmost input represent alternate (spare) units or groups of redundant units available as pooled 
spares that are initially powered down (i.e., that are cold spares). Upon the failure of any of 
the units that are active, replacements are selected from the set of spare units that have not 
yet been placed on-line. The spare units must be switched into operation in the left-to-right 
sequence in which they appear as inputs to the cold spare gate. 

^° r . 7 arnple - a11 spare units from the second leftmost input must be activated as spares 
or tailed components before any units from the third leftmost input can be activated. So 
ong as at least m i nonfailed components are present in the set of all components that are 
inputs to the cold spare gate (whether on-line or powered down), m, active (i.e., on-line) 
components are always being “used” by the cold spare gate. Once enough failures occur so 
that all remaining components are active (i.e., no spares remain that are powered down; they 
are all on-line replacing components that have previously failed), then the number of components 
lemg used by the cold spare gate is the number of the spare components that have not yet failed. 
This number decreases from mj down to 1 as subsequent failures of the spare components occur. 
Only when all components from all inputs to the cold spare gate have failed does the output of 
the cold spare gate turn on (i.e., the output event occurs). Figure 40 depicts the general form 
of the cold spare gate described here. 
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Figure 40. An n-input cold spare gate with replicated basic events. 


6. 1.4.1. CSP Gate Behavior Without Intergate Interactions 

When none of the inputs to a cold spare gate are shared with any other gate in the fault 
tree, the operation of this general form of the cold spare gate is defined by the Markov chain 
shown in figure 41. The probability of the output event of the gate is obtained by solving i 
M arkov chain for and summing the probabilities of being in state F t . This computation can be 
achieved with the analysis procedure (used in the previous examples) of deriving the Lap ace 
transform equation for each F t state, performing a partial fraction expansion, and then mver mg 
the resulting expression to obtain the probability expression (in the time domain) of the state F t . 

The action of the cold spare gate is often more clearly illustrated when the Markov chain is 
labeled with component status tuple notation rather than the list of component failures notation 
As mentioned in section 6.1, a component status tuple indicates how many of each type of 
component in the system is still nonfailed. The term nonfailed refers to both active (on-line) 
and cold (off-line) components. Additional information can be added to the component status 
tuple of each state to further clarify the action of the cold spare gate. That additional information 
has the form of a second tuple containing one element for each input of the cold spare ga e ( or 
fault trees that have several cold spare gates, one such auxiliary tuple can be added for each co cl 
spare gate in the fault tree). The value of each element indicates how many units are currently 
being used (i.e., on-line and active) for each input of the cold spare gate. The component 
status tuple is separated from the in-use descriptor tuple by a double bar. For example a state 
labeled mym 2 ■ • ■ m n \\uyu 2 ■ - • u n is one in which my components of type 1 are nonfailed ne 
either operating on-line or powered down and awaiting activation), m 2 components o yp 
are nonfailed, uy primary units (of component type 1) are active and on-hne (i-e.,_in use ) and 
u 2 units of type 2 are on-line, etc. Note that u, < m t for 1 < * < n at all times. The initial all 
components working” state of a Markov chain for the cold spare gate in figure A8 always has 
the form mim 2 • • • m„||mi0 - • • 0. This form indicates that the mi primary units are initially 
all on-line and working correctly and all spare units used by the cold spare gate are initial y 
powered down and therefore not yet in use by the gate. In general, a state is vulnerable o 
failures of components under a cold spare gate in accordance with nonzero values for the in-use 
tuple (i.e., the second tuple denoted by the uy u 2 ■ '•%) of the gate because this tuple is t e one 
that indicates which components are active for the cold spare gate (and hence eligible to fai ). 

Figure 42 shows the transitions that can be experienced by a general state in the Markov 
chain for the cold spare gate of figure 40 when the component status tuple notation is used. 


80 



0 




Figure 41. Markov chain defining action of n-input cold spare gate with replicated basic events. 


^ ~fu? 7 - mi f ° r 1 - * - n ’ where w i denotes the number of components 

st .tes A r i W ° “ g n0nfailed ) ) The doming transitions come from upstream 
w icrc or each transition at least one of the u t primary components has failed. 

. 1“ eXAinp !. e ’ thc actl “ n of the cold 8 P are gate shown in figure 43 is defined by the Markov 

the staU^iDh"- fi fh rC 44 ' EaCh StatC LS ,abcl0d With a two-part state tuple. The first part of 
state, tuple is the component status tuple. A double bar separates the first part of the tuple 

from the second part. The second part of the state tuple is the in-use descriptor tuple for the 
used U contains the number of components of each type that are currently being 

f ‘ . • f C Spai ° gatc ’ That 1H ’ tho components being used are on-line and performing the 

J : !? t,OIIS \ 10 P rnuar y units of the cold spare gate that were initially on-line. The seeond part 

f the tuple shows exactly which spares are currently active (replacing failed components) 
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Figure 43. Example cold spare gate with replicated basic events. 


Initially, the system begins with all three primary units (all of component type 1) operating 
correctly The two spare components of type 2 and the one spare component of type 3 are o - 
line This status is indicated by the value 300 in the second part of the state tuple of the initial 
state in the Markov chain. Eventually, one of the three primary units fails, and one of the spare 
components of type 2 is switched on-line to take the place of the failed component. The system 
moves to the state labeled 221 j|210. In this case, the second part of the state tuple indicates that 
two components of type 1 and one spare component of type 2 are in use by the gate. T1 f °ther 
spare component of type 2 and the spare component of type 3 are still off-line (powered down) 
and hence not in use by the gate. Consequently, these two components cannot fail yet because 
it is assumed that powered down components do not fail. Once the spare component of type 2 
that is selected to replace the failed component is activated, it can fail at any time after it is 
placed on-line. Therefore, the next failure the system experiences can be either one of the two 
remaining components of type 1 (in which case the system goes to the state labeled 121 1| 120) or 
the active component of type 2 (leading the system to go to the state labeled 211||210). 


When one of the components of type 1 fails, the remaining component of type 2 is activated 
to replace the second failed component of type 1, and the system is left operating with one 
component of type 1 and two components of type 2 (as indicated by the second part of the 
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state tuple, 120). When the active component of type 2 fails, the second component of type 2 
is activated to replace the first component of type 2 (which just failed), and the system is left 
operating with two components of type 1 and one component of type 2 active. This status is 
indicated by the second part of the state tuple, 210. Note that the second part of the state 
tuple for this state (211 1|210) has not changed from the second part of the tuple for the previous 
state (221 1| 2 1 0) even though the system has one less component of type 2 in working order. 

ic second part of the state tuple records only the number of each type of component that is 
in use by the cold spare gate, and from the previous state to the current state the number of 
components of each type that are in use has not changed. One component of type 2 has failed 

and been replaced by the other component of type 2; thus, the count of active components are 
unchanged. 
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While in state 211||210 the system can experience either a failure of one of the two components 
of type 1 or a failure of the component of type 2. Either kind of failure results in the last spare (of 
component type 3) being activated and placed on-line. If one of the components of type 1 fai s, 
then the system goes to state 1 1 1 1 | 1 1 1 in which one of each component type is operating on-line^ 

If the component of type 2 fails, the system goes to state 201||201 in which two components of 
type 1 and the component of type 3 are all operating on-line. Note that all or some components 
of type 2 can fail before all components of type 1 fail, and components of type 3 can be activated 
before all components of type 1 fail. The cold spare gate with replicated basic events differs 
from the sequence-enforcing gate with replicated events in this respect. The fact that the 
cold spare gate enforces the sequence of component activation rather than failure accounts for 
this difference. The cold spare gate prevents any components that have not been activated 
from failing. Once activated, however, components can fail at any time. Consequently; once 
components A and B are both active, component B can fail before component A even though 
component A may have been activated before component B. By contrast, the sequence-enforcing 
gate enforces the sequence of allowed component failures (or, more generally, the sequence of 
event occurrences ); thus, all components of type 1 must fail before any components of type 2 or 
type 3 can fail. 

The remainder of the Markov chain in figure 44 can be similarly interpreted. Note that the 
sum of all components in use by the cold spare gate (as indicated by the second part of the 
state tuple) always equals the original number of primary units (in this case, three) until fewer 
than that number of components remain that have not yet failed. Then, all remaining spare 
components have been activated and placed on-line, and subsequent failures are not replaced by 
spares (there are no spares left). Thus, degraded system performance results. The output of the 
cold spare gate turns on only when one of the states labeled F\, F 2 , or F 3 is reached. 

The system modeled by the fault tree in figure 43 remains operating as long as is at least 
one component among the primary and spare units is working. However, sometimes the system 
to be modeled has a critical minimum component count of components that must be operating 
for the system to remain functional. For example, suppose the system shown in figure 43 can 
only remain operational if at least three components from among the primary and spare units 
remained operational. This system can be modeled easily by adding an M-out-of-N gate (where 
M is one greater than the critical minimum number of components that need to be operational, 
and N is the total number of components that are inputs to the cold spare gate) to the cold 
spare gate. All inputs to the cold spare gate must also be inputs to the M-out-of-N gate. The 
outputs of the M-out-of-N gate and the cold spare gate should then become inputs to an or gate. 
Figure 45 shows the system for which at least three components from among the primary and 
spare units must be operational in order for the system to be operational. Figure 46 shows the 

resulting Markov chain. 

6. 1.4. 2. CSP Gate Behavior With Spares Shared Between CSP Gates 

Unlike the sequence-enforcing gate, the cold spare gate can interact with other gates of the 
fault tree in two special ways. The first occurs when two or more cold spare gates share an 
alternate (spare) unit (or group of pooled spare units). Figure 47 depicts a situation where 
n - 1 system components share a spare unit between them. In the figure, the first cold spare 
gate to have its primary unit fail does not “fire” (i.e., the output event does not occur) until 
the spare unit subsequently fails. In the meantime, any subsequent failure primary unit of any 
other cold spare gate (before the spare unit fails) causes that cold spare gate to fire immediately 
because the shared spare is no longer available to replace failing primary units (it has already 
been used to replace the first failed primary unit). Figure 48 shows the equivalent Markov model 
that defines this interaction between cold spare gates sharing spare units. The probability of 
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Figure 45. Cold spare gate with critical minimum complement. 



the top event of the fault tree of figure 47 is obtained by solving the Markov model in figure 48 
for the probability of each of the states labeled F t and then summing the probabilities of those 
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Figure 48. Markov model defining action of n — 1 spare-sharing cold spare gates. 


states. This computation can be accomplished with the analysis method used in the preceding 
sections to analyze the Markov models for the other gates. 

An additional consideration arises when a replicated basic event representing a group of 
pooled spares is shared between two or more cold spare gates. The order in which failures 
occur among activated members of such a group of pooled spares can be significant and must be 
accounted for in the Markov chain. To illustrate this point, consider the fault tree in figure 49. 
The Markov chain for this system is shown in figure 50. The first part of the label of each state 
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I idlin' 19. built live with cold span' gates sharing pooled span 


IS the component status tuple. The next part, separated by a double bar from the first part 
is an ui-use tuple denoting the components that are in use by the leftmost cold spare gate in 

gU . I ' ( / * ’ . 10 ast part of tho stat0 lal)d ( a £ ain separated by a double bar from the previous 

pan) is ail m-use tuple denoting the components that are in use by the rightmost cold spare 
ga c m figme Id. In tins example, two spare units of type 3 are pooled together and shared by 
he two cold spare gates m the fault tree. If component 1 fails first, one of the components of 
Hpe 3 is actuated to replace it. If component 2 fails next, the other component of type 3 is 

iV'l To ur'u W( ' U {:0,np0nen ' ° f ty ‘ K ' 3 ( (me of the units in the replicated basic event 
labeled 2*3) fails next, the order m which the two components of type 3 fail is now significant If 

lh ° r!T Ut ! t0 n>PliU ' 1 ’ U, ° faik ' d "^Ponent of type 1 fails first . the system behavior 
can be different than if the component selected to replace the failed component of type 2 fails 
ns .. le ok (a of failures is illustrated in figure 50 in the transitions from state OO'-M 1 j|0 1 (>||() 1 () to 
states 0011 1 OOIIIOIO and 001 1 , ||010||(K)1 . The se, of descendant states of state 00 1 ! l ||()(,l HO 
(and hence the behavior of the system once it has reached state 001 1 1 [)()() 1 1| 0 1 0) differs from the 
set ofdescemdent states of state 001 1 1 1|010||001 . This difference in system behavior is accounted 
o. by the fact that, the process of activating and replacing failed components has transformed 
he two components of type 3 from being functionally equivalent units (which they were when 
icy weie off-line pooled spares that were powered down) to being functionally distinct units in 
SystcI,K T,as transformation need not always occur. For example, suppose that, in figure 49 
two components of type 1 instead of one were attached to the leftmost input, of the leftmost 
co ( spate gate. Further, suppose that both of these components of type 1 have failed and that 
Kith of the components of type 3 were act ivated to replace them. If the next failure is one of 
the components of type 3, then which one of the two actually fails first is not significant The 
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subsequent behavior of the system is the same no matter which component of type 3 fails. On 
the other hand, suppose only one of the components of type 1 fails and is replaced bv one of 
the components of type 3, and then the component of type 2 fails and is replaced by the other 
component of type 3; then suppose the next failure is a component of type 3. In this case, the 
system behavior may depend on which of the two components of type 3 fails first. 


This type of sequence deponde ?y can be subtle and difficult to track in the Markov chain 
for a fault tree that has cold spare gates that share pooled span's. However, the use of the 
auxiliary in-use tuples previously described provides a satisfactory way of accounting for these 
sequence dependencies in the Markov chain. This state labeling method, which is perhaps not 
the most efficient method of recording sequence dependencies in Markov chain states, makes it 
comparatively easy for human modeling engineers to understand what is going on in the Markov 
chain (and consequently the fault tree) model of the system. 
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6. 1.4.3. CSP Gate Behavior for Spare Shared With Functional Dependency 


The second special mtergate interaction of a cold spare gate occurs when a spare unit for 
a cold spare gate is also a dependent event for a functional dependency gate. Normally the 
cold spare gate disallows the spare from failing before the primary unit fails. However, if the 
spare unit is disabled as a result of the failure of some other component through the action of a 
unctional dependency gate, then the failure of the spare is permitted even though the primary 
unit may still be operational. This apparent exception to the cold spare gate definition is needed 
because the disabling of a spare unit (modeled with the functional dependency gate in this way) 
is expressed by considering the spare unit as functionally failed, even though the unit mav 
not actually have failed. Figure 51 shows the simplest fault tree that models this situation, and 
gure 52 shows the equivalent Markov model that defines this interaction between the functional 
dependency gate and the cold spare gate. The normal action of the cold spare gate would have 
prevented state 1,3 (and any of its descendent states) from being generated. The effect is that 
component 3 is prevented from failing before component 2 fails, and component 1 is prevented 
from tailing before component 2. However, because component 3 (the spare of the cold spare 
gaf ^ 1S a dependent, event of the functional dependency gate, state 1,3 is generated and included 
in the Markov chain along with its descendent states. The probability of the top event of the 
fault tree of figure 51 is obtained by solving the Markov model in figure 52 for the probability 
of each of the states labeled F { and then summing the probabilities of those states. 



Figure 51. Fault tree interaction between functional dependency gate and cold spare gate. 


6.2. Fault-Tree-to-Markov-Chain Conversion Algorithm 

An arbitrary fault tree can be converted into an equivalent Markov chain with a fault-tree-to- 
Markov-chain conversion algorithm. The original version of this algorithm is described in detail 
in reference 49. This original algorithm has been expanded to allow the addition of sequence- 

ependency gates to the standard set of traditional fault tree gates. A sketch of the updated 
algorithm is as follows: 
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Figure 52. Markov model interaction between functional dependency gate and cold spare gate. 

Algorithm ft2mc 

input component failure rates; 

input and build internal representation for fault tree; 
determine number of basic even nodes, 
determine system ‘‘initial operational state’’; 
open output file; 

initialize state queue and state table; 

place “initial operational state’’ onto the state queue; 
while (state queue not empty) do 
{ 

remove next originating state from queue; 
for each component i in the state tuple do 

{ 

simulate a failure of one of component i; 
evaluate the effect on the resulting state of 
Functional Dependency gates; 
evaluate the effect on the resulting state of 
Cold Spare gates; 

evaluate the effect on the resulting state of 
Priority-AND gates; 

evaluate the effect on the resulting state of 
Sequence Enforcing gates; 
look up resulting state in the state table; 
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if (resulting state is new, i.e. not in 


state table) then 

{ 


if (not system failure) then 

add resulting state to queue; 
record resulting state in state table; 


output arc from originating state to resulting state; 
undo simulated failure of one of component i; 

} /* End of for loop */ 

} /* End of while loop*/ 
close output file; 
end ft2mc. 


This algorithm has been modified somewhat to allow lumping together of multiple transitions 
rom an originating state to a single resulting state. However, this algorithm is the simplest 
conceptual expressure of the fault-tree-to-Markov-chain conversion algorithm and suffices for 
tne purpose of describing our work here. 
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Chapter 7 

Advanced Modeling Techniques 

This chapter informs the advanced user of some important modeling features of HARP The 
first section illustrates a technique for specifying FEHM’s directly from a fault tree instead of the 
typical FEHM specification. The extent to which the FEHM specification can be implement 
in the fault tree notation has not been explored. 

The second section addresses the application of HARP’s multifault models to a specific class 
of system architectures that use nearly independent fault containment regions. A study has 
shown Xat as the number of fault containment regions increase, HARP’s multifault ALL model 
produces an increasing greater conservative result. 

7.1. State-Dependent FEHM’s in Fault Trees 

Figure 53 shows the use of the sequence-enforcing gate to force state-dependent FEHM 
insertion in a fault tree. The FORM without FEHM’s for the model in the figure is an or 
gate with the same top event (FBOX) and basic events 3*1 and 3*2. This notation specifies 
three units of type 1, shown as P (e.g., a processor) in the figure and three units of type 2, 
shown as Q (e g buses). By splitting the number of P and Q units and renaming the groups 
tTZ jJ jiP 1 Ji *2 (1*1, 1*2, 2*3, 2*4 in HARP notation) different FEHM 
be assigned to the units. In this example, no FEHM’s are assigned to P and Q ,\ but the same 
or different FEHM’s can be assigned to the 2P1 and 2QI units, respectively. The sequence- 
enforcing gates enable this capability by precluding the 2P1 and 2Q1 units from failing u 
theP or Q unit fails. The modeling effect is that because no FEHM’s were specified for the 
first P or Q failure, a unity fault/error recovery probability (coverage) is modeled for the i first < of 
these failures . Subsequent failures have the specified FEHM’s inserted into the resulting Markov 

chain as usual. 



State-Dependent FEHM’s 

Figure 53. State-dependent FEHM’s fault tree. (This figure is identical to figure 32.) 

7.2. Approximating Multifault Models 

As discussed in section 2.7, HARP offers three simplified models, ALL-inclusive, SAME- 
type, and USER-defined, for automatic multifault model generation when invoking behavioral 
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decomposition. The penalty for utilizing a simplified multifault model when all failure modes 
are captured is an increase in the conservatism of the unreliability predictions The degree 
of conservatism is a function of the disparity of the FORM and FEILM sojourn times The 
advantage of using a simplified multifault model is a considerable reduction in computation and 
user effort to define the input for the detailed model. For most systems the simplified models 
prov* «eptal,le results (refs. 27, 32, and 34). When this is not the case, the user can modify 

‘ e A e ge “ erated Ascn fi,cs to get an accurate model with behavioral decomposition Using 
the AS IS model and using the X Window System (XHARP) are alternative wavs to obtain more 
accurate results (ref. 5). 

Figure 54 is an example of the use of the ALL-inclusive model for a system where transitions 
out of recovery states is possible (ref. 5). With behavioral decomposition. HARP ignores these 
transitions unless the user invokes the ALL-inclusive multifault model, as shown in figure' 54. 
The system consists of two triads with failure rates Aj = A 2 = 0.25 x ltT'Vhr and the recovery 
rate 6 - 0.72 x 10 /hr ( r sec mean recovery). Recovery is always successful, unless a near- 
coincident fault occurs. A near-coincident fault in the same triad causes that triad to go off-line, 
but, the system remains operational. A near-coincident fault in the other triad causes a system 
ai ure. Both triads cannot be executing recovery procedures simultaneously, and the system is 

° P if re? 1 , / f - 9 1 least ° ne triad is °P erational - For a mission time of 100 hr, the unreliability 
is 0.o04 x 10 . Figure 55 shows the instantaneous model when the ALL-inclusive model is 

chosen. 



Figure 54. Two triad fault- tolerant system full model. 
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Because HARP uses a simple multifault model (ALL-inclusive model m this case) and 
behavioral decomposition, the instantaneous coverage requires that the near-coincident fault 
transitions (e g from R2 to F3) be modeled as system failures. This instantaneous model 
solution produced an unreliability of 0.608 x 1(T 9 , which is a conservative error of 20.6 percent 
Conservative errors of this magnitude should be acceptable for most applications. not, 
XHARP system applied to this model yields an unreliability of O.o x 

7.3. Markovian Mtodels With Hot Weibull Spares 

Markov models with Weibull hot spares behave differently than Markov models with constant 
failure rate hot spares. Constant failure rate models exhibit the so called memoryless property 
That is, when a spare (warm or cold) is switched in, the spare that lms not “ 1 

it were brand new. By definition, this condition is guaranteed for a cold spare. The constan 

failure rate spare does not remember its past use history. 

A Weibull spare, bv contrast, does remember its use history. So when a hot Weibull spare is 
switched to it behaves as if it were operating from time zero with the except, on that ,t was not 
allowed to fail until it switched in. If the Weibull is a decreasing failure rate ** ££ 
switched in has a lower instantaneous failure rate than a brand new part. The opposite is true 
for a cold Weibull spare. When the cold Weibull spare is switched in, its instantaneous failure 
rate is at maximum. Thus, Weibull cold spares may not increase system reliability as much as a 
hot WeM spare for decreasing rates. This failure behavior differs from what constant failure 

rate parts exhibit (ref. 19). 

7.4. Non-Markovian Models With Weibull Failure Rates 

A Markov chain with nonconstant failure rates such as the Weibull is called ^onhomogen^us 
Markov chain. This stochastic process has one time variable (mission clock) that starts 
zero with a value of zero. When a cold or warm Weibull spare is introduced into such a mode , 
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another time variable is required that is initiated when the cold or warm spare is activated 
Mark !T "* C "" ed processes, that is. Markov and seme 

c»at htv MCI hIrp “ rc and HARP does not have this 

X * SPeC,a "“' d M °" te Carl “ S ""'" ali ™ "■ '^aligned to 

A nonhornogeneous Markov chain will, Weil, nil failure distributions is no longer Markovian if 
repan is introduced. As in the models previously discussed, a separate cloek must he established 
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Appendix A 

Known Bugs in HARP Version 6.1 

This appendix lists known bugs in HARP version 6.1. These bugs were fixed in version 6.2. 

1. An infinite loop is encountered if the mission time, sampling interval, or variation is entered 
as anything but a number. 

2. The ESPN FEHM model has the following bugs. 

• It does not recognize a net in which all transition firing times are constant and therefore 
cannot solve it. 

• When only one exit is reachable in the net or the probability of reaching the exit is 
sufficiently close to 1.0, an infinite loop occurs as the program keeps doubling the trials. 

• Higher moments of time to reach various exits are sometimes incorrectly set to zero. 

• In cases where S exit is a rare event, the outcome of the simulation may vary drastically 
and cause the unreliability of the overall system to change dramatically. 

• In the case of transient faults, the program does not always simulate the net exactly as 
the drawing indicates. 

• The seed for the random generator in the UNIX version can repeat itself; hence, the 
simulation can have undesired correlation. 

3. The ARIES model has the following bugs: 

• It does not accurately calculate each exit probability correctly and ignores the variable 
that pertains to failure of the recovery hardware. However, the model is correct as defined 
in this Technical Paper. 

• It incorrectly calculates the moments for each exit in the case that one or more component 
failure times are Weibull distributed. 

4. Certain combinations of cold spare gates and functional dependency gates give rise to a CSP 
gate behavior that has not yet been implemented in HARP. The affected combinations o 
gates are somewhat unusual and should rarely be needed. The behavior of the cold spare 
gate and its proper uses are defined in full in this Technical Paper. In particular, users of 
HARP version 6.1 should avoid combining functional dependency gates and cold spare gates 
with shared spares in such a way that an input event of one of the cold spare gates is also a 
dependent event of the functional dependency gate. 
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Appendix B 


Warning and Error Messages 

This appendix contains the warning and error messages for the HARP program. 

+++ WARNING C100: ILLEGAL PARAMETER IN WE I BULL DEVIATE GENERATOR SUBROUTINE, 

ALPHA = 0 in DVWEBL +++ 

File: DEVGEN 

Subroutine: DVWEBL 

Meaning: If the FEHM model chosen is the HARP default ESPN model, this model is 
simulated for solution. In the course of the simulation, deviates of the distributions for the 
timed transitions are generated. When the specified distribution is Wei bull and alpha, the 
shape parameter (ref. 44), is zero, the resulting function is not a distribution. In this case, the 
deviate returned is set to zero. 


+++ WARNING C150 : MORE TRIALS NEEDED FOR NORMAL APPROXIMATION IN SIMULATOR +++ 

File: HARPSIM 

Subroutine: STATS 

Meaning: If the FEHM model chosen is the HARP default ESPN model, this model is 
simulated for solution. This warning may appear during the statistical analysis of the simulation 
data. In estimating the confidence intervals about the exit probabilities, a normal approximation 
to the binomial distribution is used. This approximation is valid if n*p is greater than 5. (This 
rule is discussed more fully in ref. 44.) If n*p is less than 5, then the number of trials is doubled 
and the simulation continues. The initial number of simulation trials run is 1000. so this message 
appears if any of the exit probabilities are less than 0.005. 


+++ WARNING C155 : MORE SIMULATION TRIALS ARE NEEDED TO REDUCE PERCENT**** ERROR 
TO WITHIN THE VALUE SPECIFIED BY USER +++ 

File: HARPSIM 

Subroutine : STATS 

Meaning: If the FEHM model chosen is the HARP default ESPN model, then this model is 
simulated for solution. This warning may appear during the statistical analysis of the simulation 
data. Confidence intervals about the exit probabilities are generated, and a check is made as to 
the relative size of the interval. If the band/estimate (*100) is greater than the percent error 
specified by the user, then more trials are needed to reduce the width of the interval (band). 
In this case, the number of trials is doubled, and the simulation continues. The initial number 
of simulation trials run is 1000; this number is doubled until percent error in all three exit 
probabilities is less than the value specified by the user. If this message appears more than 5 or 
6 times, the simulation may be long (maybe an hour), and the user may want to reduce the 
percent error requested. Each time this message appears, the percent error for the exit being 
analyzed is displayed. 
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**** ERROR C500 : ILLEGAL VALUE FOR COVVAL *** 


File: COVFAC 

Subroutine: COVNOM 


auuiuin/iiic . • * 

Meaning: This error is displayed whenever the numeric value for a coverage factor is greater 
than 1 or less than zero. If all the inputs are correct, this error signifies that the basic justification 
for the behavioral decomposition has been violated. That is, the average amount of time spent in 
the coverage model is comparable with the time between failures rather than relatively differen 

orders of magnitude. 


**** ERROR C600 : 
File: 

Subroutine : 


CALLING FOR A DEVIATE FOR NODIS *** 
DEVGEN 
DEVGEN 


Meaning: If the FEHM model chosen is the HARP default ESPN model, then this model 
is simulated for solution. During the simulation, deviates of the distributions for the tune 
transitions are generated. If the user has specified “No Distribution (believing that the 
corresponding transition would never be enabled), and a deviate is called for, t en is error is 
printed and execution halts. 


**** ERROR C750 : FEHM FILE NOT FOUND *** 

File: COVFAC 

Subroutine: COVNOM or COVVAR 

Meaning- This error appears if the file containing the parameters to be used for the FEHM 
file does not exist. The name of the file requested is displayed. The obvious correction is to be 
certain that the FEHM files listed in the dictionary do indeed exist. 


**** ERROR C755 : UNRECOGNIZED FEHM TYPE *** 
File : COVFAC 


Subroutine : 


COVNOM or COVVAR 


Meaning- The first line of a file containing the parameters for a FEHM file states the model 
type being described. If the COVFAC routine does not recognize the file type this message is 
printed, as is the first line of the FEHM parameter file, and the name of the FEHM file. 


**** ERROR C900 : INVALID RATE FOR EXPONENTIAL DIST 
File: DISTS 


Subroutine : 


EXP or MEXP 


Meaning: This message appears if the rate parameter (1/mean) for the (negative) exponential 
distribution (for DISTRIBUTIONS FEHM model) is less than or equal to zero, lo correct this 
error, check the FEHM file(s) to be certain that the specified value of any rate parameter is 

positive. 


98 


**** ERROR C905 : SCALE PARAMETER FOR WE I BULL IS ZERO 
File: DISTS 

Subroutine: WEIBL or MWEIBL 


th T , hi f m .? 1 SSag , e t PPCarS if thG P arameter for the Weibull distribution of time to exit 

the FEflM model is illegal. To correct this error, check the FEHM parameter file(s) to be certain 
that the scale (rate) parameter for the Weibull distribution is positive 


**** ERROR C910 : SHAPE PARAMETER FDR WEIBULL IS ZERO 
File: DISTS 

Subroutine: WEIBL or MWEIBL 


the FFHM™ L h] , S TT m PearS * ^ parameter for the Weibull distribution of time to exit 
Sit the shaT t V S b correct this error, check the FEHM parameter file(s) to be certain 

that the shape (alpha) parameter for the Weibull distribution is positive. 


**** ERROR C915 : HIGH VALUE < LOW VALUE FOR UNIFORM DIST. 
File: DISTS 

Subroutine: UNIFRM or MUNIF 


the^FHlvf 'rnn^ Tr^ ■! t l ie parameters for the uniform distribution of time to exit 

T , . 6 are ' C f a ^ i e '’ lf the u PP er limit is less than or equal to the lower limit) 

limits “r"’ FEHM Parameter fil6(s) t0 be certain that the upper and lower 


**** ERROR C920 : SCALE PARAMETER FOR GAMMA IS ZERO 
File: DISTS 

Subroutine: GAMDST or MGMDST 


exit the FEHM iw dT § t f ^ parameter for the S am ma distribution of time to 
rJt tlSS 1 18 ZCTO ' T ° Correct this error > ch eck the FEHM parameter file(s) to be 

certmn that the scale (rate) parameter for the gamma distribution is positive 


**** ERROR C925: SHAPE PARAMETER FOR GAMMA IS ZERO 
File: DISTS 

Subroutine: GAMDST or MGMDST 


to exitTho FFHM "T I s ' “ PPea ? if lhe Shape P “ rameter for the gamma distribution of time 

certain that thela ^ "'"T th ' 5 error ' check lho FEHM Parameter file(s) to be 

certain that the shape (alpha) parameter for the gamma distribution is positive. 


**** ERROR C930 : ILLEGAL PARAMETER FOR HYPEREXP DIST. 
File: DISTS 

Subroutine: HYPER or MHYPER 


Meaning: This message appears if a rate parameter or probability for the 
distribution of time to exit the FEHM model is less than or equal to zero 


hyperexponential 
To correct this 
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error, check the FEHM parameter file(s) to be certain that the probabilities and rates for the 
hyperexponential distribution are positive. 

**** ERROR C935 : ILLEGAL PARAMETER FOR HYPOEXP DIST. 

File: DISTS 

Subroutine: HYPO or MHYPO 

Meaning: This message appears if a rate parameter for the hypoexponential distribution 
of time to exit the FEHM model is less than or equal to zero. To correct this error check the 
FEHM parameter file(s) to be certain that the probabilities for the hypoexponential distribution 

are positive. 


++++ WARNING E031 , WEIBULL FAILURE RATE USED WITH REPAIR 
File: FILL 

Subroutine: FILSYM 

Meaning- The model contains both a (time varying) Weibull failure rate transition and a 
constant repair rate transition. The results may >e meaningless with this combination. 

Action: The user needs to analyze the model and determine if the combination of time- varying 
transitions with repair transitions is correct. 


+ + ++ WARNING E032, WEIBULL FAILURE RATE USED WITH COLD SPARES 
File: FILL 

Subroutine: FILSYM 

Meaning: The model contains both a (time varying) Weibull failure rate transition and a 
cold spare (as specified in the fault tree). The results of the solution of this model are suspect. 

Action: Check the model to be sure that it is the one intended. 


++++ WARNING E033 , BEHAVIORAL DECOMPOSITION ASSUMPTIONS VIOLATED 
File : HARPENG 

Subroutine: HARPENG 


Meaning: The model contains states that are too fast (relative to the slowest FEHM). This 
warning arises when the fastest mean time to exit for any FEHM is less than 1000 times the 
mean sojourn time in the fastest state. The warning is issued to alert the user that predicted 
unreliabilities/availabilities may be overly conservative. The “magic number” 1000 was chosen 
based on observed typical system models so that this message does not appear too often. 


Action: Check the model to be sure that it is the one intended. 


++++ WARNING E060 , INVALID INPUT CHARACTER -x- IS IGNORED 
File: SCAN 

Subroutine: ICLASS 

Meaning: A character in the input stream of a symbolic expression cannot be classified as a 
digit, upper or lower case alphabetic, operation sign, or parenthesis. It is ignored. 
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Action: If an input file is being used, it should be edited to remove or correct the offending 
character. If the character is not printable (i.e., a blank or null character appears between the 
dashes in the error message), the offending character may be a control character. 

++++ SKIP: WARNING E061 - UNEXPECTED END OF FILE 
File: FEHMUTL 

Subroutine: SKIP 

Meaning: The SKIPQ subroutine skips a specified number of lines in an input file. If an EOF 
is encountered unexpectedly during this operation, this warning message is produced. 

Action: Check the input files for harpc.ny. one of them might be corrupted. 

++++ GERK WARNING E200 , MANY STEPS 
File: GCALL 

Subroutine: GCALL 

Meaning: This message reflects an error code of 3 from the GERK ODE solver. Thus, 
the integration was not completed because more than 9000 derivative evaluations were needed 
(~500 steps). The model may be too stiff for GERK to handle accurately and/or the mission 
time may be too long. 

Action: Determine whether the stiffness is inherent in the model formulation and/or if the 
mission time can be reduced. If the model cannot be changed, then another ODE solver may 
be more appropriate. 


++++ GERK WARNING E201, TOLERANCES RESET: x.xxx-xx y.yyy-yy 
File: GCALL 

Subroutine: GCALL 

Meaning: This message reflects an error code of 4 or 5 from the GERK ODE solver. Code 4 
means that the integration was not completed because the solution vanished, making a pure 
relative error test impossible. Thus, GERK must use a nonzero absolute error tolerance to 
continue. Code 5 means that the integration was not completed because the requested accuracy 
could not be achieved with the smallest allowable stepsize. Thus, GERK must increase the error 
tolerance before continued integration can be attempted. 

Action: No user action is required. The GCALL subroutine automatically sets a positive 
absolute error tolerance for a code 4 return and increases the relative error tolerance by a factor 
of 10 for a code 5 return. 


++++ GERK WARNING E202, MUCH OUTPUT 
File: GCALL 

Subroutine : GCALL 

Meaning: This message reflects an error code of 6 from the GERK ODE solver. Thus, GERK 

is being used inefficiently in solving the model; too much output is restricting the natural stepsize 
choice. 

Action: If convenient reduce the mission time. 
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++++ GERK WARNING E300, (BND) MANY STEPS 


File : GCALL 

Subroutine: GCALL2 


Meaning: This message reflects an error code of 3 from the GERK ODE solver. Thus, 
the integration was not completed because more than 9000 derivative evaluations were needed 
(«500 steps). The model may be too stiff for GERK to handle accurately and/or the mission 

time may be too long. 


Action: Determine whether 
mission time can be reduced. 


the stiffness is inherent in the model formulation and/or if the 
If the model cannot be changed, then another ODE solver may 


be more appropriate. 


++++ GERK WARNING E301, (BND) TOLERANCES RESET: x.xxx-xx y.yyy-yy 
File : GCALL 

Subroutine: GCALL2 


Meaning: This message reflects an error code of 4 or 5 from the GERK ODE solver. Code 4 
means that the integration was not completed because the solution vanished, making a pure 
relative error test impossible. Thus, GERK must use a nonzero absolute error tolerance to 
continue. Code 5 means that the integration was not completed because the requested accuracy 
could not be achieved with the smallest allowable stepsize. Thus, GERK must increase the error 
tolerance before attempting continued integration. 

Action- No user action is required. The GCALL2 subroutine automatically sets a positive 
absolute error tolerance for a code 4 return and increases the relative error tolerance by a factor 
of 10 for a code 5 return. 


++++ GERK WARNING E302, (BND) MUCH OUTPUT 
File: GCALL 

Subroutine: GCALL2 


Meaning: This message reflects an error code of 6 from the GERK ODE solver. Thus, GERK 
is being used inefficiently in solving the model; too much output is restricting the natural stepsize 

choice. 


Action: If convenient, reduce the mission time. 


**** ERROR E510 , NUMBER OF STATES OUT OF RANGE 
File: FILL 

Subroutine: FILL 

Meaning: The number of states specified for the model was either less than one or more than 
the maximum number allowed by the program. 

Action: Redefine the number of states for the model. If a larger model is to be run, consult 
section 5.3 of this Technical Paper. 
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**** ERROR E511, ROW AND COLUMN OUT OF ORDER 
File: FILL 


Subroutine: FILL 


Meaning. The program requires that matrix entries be entered in row-major order so that 
e sparse matrix data structure can be built properly (i.e., rows must be entered in ascending 
order, and within a row columns must be in ascending order). This message indicates that the 
program detected an entry out of this order. 


**** ERROR E512 , ROW AND COLUMN OUT OF RANGE 
File: FILL 

Subroutine: FILL 

of states ' ng: ThG r ° W ° r COlUmn indeX USed t0 Specify a matrix entry is greater than the number 
Action: Rerun fiface for the model. 


**** ERROR E530, PARAMETER TYPE UNRECOGNIZED 
File: FILL 

Subroutine: FILSYM 

value^ 111118 ' When read ' ng fr ° m an echo file ’ the Parameter type given does not match a defined 
Action: Rerun the model through harpeng without the echo file. 


**** ERROR E550 , NEW SYMBOL ADDED IN NEXT FAULT RATE EXPRESSION 
File: GET 

Subroutine: GETNF 

Meaning. To reduce the number of symbol definition and evaluation passes over the symbol 
table, no new parameters may be defined by the next fault rate symbolic expression. This error 
indicates that a new parameter was introduced in the next fault rate expression. 

Action: Be sure that any symbols that appear in the near-coincident fault rate expression 
appear in the dictionary. 

**** ERROR E570, INCORRECT SYNTAX - EXPRESSION INCOMPLETE 
File: SYM 

Subroutine: SYMINP 

Meaning. The symbolic expression was prematurely terminated by a semicolon The 
ZingparmThi^ “ operatio " *5™'*’' <+.-.♦> « the expression may lack 

Action: Rerun fiface for the model. 


103 



**** ERROR E580 , INCORRECT SYNTAX, UNEXPECTED SYMBOL 
OFFENDING SEQUENCE IS: 

File: STORE 

Subroutine: STORE 

Meaning: The symbolic expression does not conform to the proper syntax as implemented in 
the syntax table. 

Action: Correct the symbolic expression. The following are usual suspects: 

1. No multiplicative sign between a token and its coefficient 

2. Double , ++, or ** signs 

3. No semicolon 

**** ERROR E590, TRYING TO DEALLOCATE WRONG TERM 
File: ALLOC 

Subroutine: DALLCT 

Meaning: A call to the DALLCT subroutine has been made with a pointer to a TERM node 
that was not the most recently allocated. 

Action: A serious logical error has occurred in the program and should be reported along 
with a copy of the input files for this model to the first author of this Technical Paper. 

**** ERROR E591 , POINTER TO TERM LIST NEGATIVE 
File: ALLOC 

Subroutine: DALLCT 

Meaning: A call to the DALLCT subroutine has been made before a TERM node has been 
allocated. 

Action- A serious logical error has occurred in the program and should be reported along 
with a copy of the input files for this model to the first author of this Technical Paper. 

**** ERROR E600 , OUT OF SPACE FOR TERM LIST 
File: ALLOC 

Subroutine: ALLOCT 

Meaning: More TERM nodes are required to represent the model than are currently allocated 
by the program. 

Action- To run such a large model, the program must be recompiled with more storage space 
for the data structures FACLHD and NXTTRM. See section 5.3 of this Technical Paper. 

**** ERROR E610 , OUT OF SPACE FOR FACTOR LIST 
File: ALLOC 

Subroutine: ALLOCF 

Meaning: More FACTOR nodes are required to represent the model than are currently 
allocated by the program. 
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Action: To run such a large model, the program must he recompiled with more storage 
for the data structures FACTYP, SYMENT. and NXTFAC. See section 5.3 of this Tec 


Paper. 


space 

clinical 


**** ERROR E620, OUT OF SPACE FOR SYMBOL TABLE 
File: ALLOC 

Subroutine: ALLOCS 


Meaning: More symbol table entries are required to represent the model than are currently 
allocated by the program. 

Action: To run such a large model, the program must, be recompiled with more storage' space 
lor the symbol table. See section 5.3 of this Technical Paper. 


**** ERROR E700 , IMPROPER CALL TO GERK 
File: GCALL 

Subroutine: GCALL 


Meaning: This message reflects an error code of 7 from the GERK ODE solver. Thus, a 

Ca GERK Suhroutili(! has been ma(1 ° with invalid input parameters. Possible reasons 

are NEQN < 0 T = TOUT and IFLAG = /, +1, or -1, RELERR < 0, ABSERR < (), 

IP LAG — 0, It LAG < - 2, IP LAG > 7. 


Action: A serious logical error has occurred in the program and should be reported along 
with a copy of the input files for this model to the first author of this Technical Paper There is 
no intermediate circumvention. 


**** ERROR E7 10 , WRONG FUNCTION CODE TO SETVAL 
File: SET 

Subroutine : SETVAL 


Meaning: A call to the SETVAL subroutine has been made with an undefined functi 


ion cod 


Action: A serious logical error has occurred in the program and should be reported along 

wi i a copy of the input files for this model to the first author of this Technical Paper There is 
no intermediate circumvention. 


**** ERROR E720, NEGATIVE TRANSITION RATE 
File: EVAL 

Subroutine : SYMEVL 


Meaning: 

number. 


One of the off-diagonal symbolic transition rates has appeared as a negative 


Action. Check the values assigned to the symbols, and check the 
rate. 


numeric evaluation of the 


**** ERROR E800, (BND) IMPROPER CALL TO GERK 
File: GCALL 
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Subroutine: GCALL2 

Meaning: This message reflects an error code of 7 from the GERK ODE solver Thus, a 
call to the GERK subroutine has been made with invalid input ^parameters. 
are NEQN < = 0, T = TOUT and IFLAG = /, +1, or -1, RELERR < 0, ABSERR < 0, 
IFLAG = 0, IFLAG < - 2, IFLAG > 7. 

Action: A serious logical error has occurred in the program and should be reported along 
with a copy of the input files for this model to the first author of this Technical Paper. There is 

no intermediate circumvention. 

++++ RDDICT : WARNING F100 - FAILURE RATE VARIABLE, rate, 

TOO LONG - TRUNCATED TO size CHARACTERS ++++ 

File : RDDICT 

Subroutine: RDDICT 

Meaning: The character string representing a failure rate variable that was read from the 
dictionary was longer than the allowable size for a failure rate variable and was truncated to the 

maximum legal size. 


Action: Correct the offending failure rate variable in the dictionary or use the truncated 


one. 


++++ FT2MC: WARNING F102 - STATE TABLE TOO SMALL TO 
REMEMBER ALL SYSTEM STATES , RETRYING . . . ++++ 


File: 


FT2MC 


Subroutine: FT2MC 

Meaning: To prune the state search tree, system states are remembered by storing them in 
a State Table when they are first generated. If a previously generated state is regenerated at 
some point, it does not need to be retested for system failure or added to the queue for later 
expansion. Both unique nonfailure states and states that cause overall system failure are stored 
in the State Table. When the fault tree causes so many of these states to be genera e a 
the State Table is filled, this message is printed. FT2MCQ tries to reprocess the fan t tree by 
placing itself in a mode where the states representing system failure are not individually stored, 
but instead only a single FE (failure due to exhaustion) state is stored for “mponent. 

The unique nonfailure states are still stored in the State Table. Depending on the fault tree, 
the conversion process to a Markov chain may take longer, but this method may allow larger 
systems to be processed than could be handled before. 

Action: None. If error F508 occurs after FT2MCQ makes its second attempt, then the State 
Table is too small and the number of states parameters (MSTATS m FT2MC, TABLEN an 
PRIME2 in INISTA (in CKSTAT.FOR source file)) must be increased to handle a fault tree ot 
this size. See section 5.3 of this Technical Paper for information on increasing the number of 
states HARP can handle. 

++++ FT2MC : WARNING F103 - MARKOV CHAIN TRUNCATED BEFORE ANY FAILURE 
EXHAUSTION STATES WERE REACHED ++++ 

File: FT2MC 

Subroutine: FT2MC 
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k ,TT VV hen tnmcatlon of the Markov chain to a user-specified number of states is 
enabled the generation of states is stopped after the specified number of components have 
failed throughout the system. If the redundancy of each component type in the system is larger 
an the user-specified truncation cutoff number of failures, than no FE states have been reached 
m h? 7 ; V Cham When FT2MC st °P s generating states. This warning message is produced 


++++ CKSTAT. WARNING F200 - STATE n FOUND ALREADY IN STATE TABLE 
WHILE ATTEMPTING TO ADD TO THE STATE TABLE ++++ 

File: CKSTAT 

Subroutine: CKSTAT 


Meaning When adding a (supposedly) new system state to the State Table, CKSTATf) 
found that the state was already present in the State Table. This action has no functional effect 
on the operation of the conversion process, but it may indicate an internal programming problem 

somewhere in the FT2MC subsystem. 8 P 1 

Action: Report warning message to the first author of this Technical Paper. 


++++ INPTRE : WARNING F300 - MAX BASIC COMPONENTS ALLOWED 
IN FAULT TREE = maxcmpts ++++ 

File: INPTRE 

Subroutine: INPTRE 


Meaning: When entering a textual description of a fault tree, the user has attempted to 
specify more basic component nodes than the system currently allows. 

Action: Rebuild the FT2MC subsystem with a larger value for the maximum number of basic 

component nodes allowed; that is, increase the MCMPTS parameter in all FORTRAN source 
hies, recompile, and relink. 


++++ INPTRE: WARNING F301 - MAX NO. NODES ALLOWED IN FAULT TREE = maxnodes ++++ 
File: INPTRE 

Subroutine: INPTRE 


Meaning: W hen entering a textual description of a fault tree, 
specify more fault tree nodes than the system currently allows. 


the user has attempted to 


Action: Rebuild the FT2MC 
fault tree nodes allowed; that is, 
files, recompile, and relink. 


subsystem with a larger value for the maximum number of 
increase the MNODES parameter in all FORTRAN source 


**** ENCSTA: ERROR F400 - # FAILURES EXCEED DECLARED LENGTH OF AN ENCODED STATE 
File: ENCSTA 

Subroutine: ENCSTA/DECSTA 

Meaning: ENCSTA () encodes a system state from a tuple of working components to a list 

nFCSTAn POne f tS ! failed t0 bnng the system from its ori g ina l state to the current STATE. 
DECS 1 A() performs the corresponding inverting conversion. Both routines declare a vector of 
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a certain length to hold the encoded state. If more component failures have occurred than can 
be stored in the declared vector, this error message is produced. As currently implemented, this 
error represents an internal programming error within FT2MC. 

Action: Report error to the first author of this Technical Paper. 

**** DECSTA: ERROR F401 - CMPNT TYPE cmp SPECIFIED IN ENCODED STATE IS 
OUT OF RANGE 
File: ENCSTA 

Subroutine: DECSTA 

Meaning: ENCSTA() encodes a system state from a tuple of working components to a list 
of the components that failed to bring the system from its original state to the current STATE. 
DECSTA() performs the corresponding inverting conversion. If during the decoding process 
DECSTA() encounters in the list of failed components a component type that is not present in 
the system, this error message is produced. This error represents an internal programming error 

within FT2MC. 

Action: Report error to the first author of this Technical Paper. 

**** FT2MC : ERROR F500 - FAULT TREE NAME TOO LONG **** 

File: FT2MC 

Subroutine: FT2MC 

Meaning: The modelname passed to FT2MCQ was too long and was rejected. 

Action: Specify a shorter modelname. 

**** FT2MC : ERROR F504 - ERROR OPENING MARKOV CHAIN OUTPUT FILE filename **** 

File: FT2MC 

Subroutine: FT2MC 

Meaning: The FT2MC() subroutine encountered an error while trying to open the Markov 
chain output file. This error is an operating system error rather than an FT2MC error. 

Action: Consult the operating system manuals for the cause and possible solutions. 

**** OUTARC: ERROR F505 - ERROR retcode RETURNED BY INTCHRO **** 

File : FT2MC 

Subroutine: OUTARC 

Meaning: The OUTARCQ subroutine outputs a Markov chain arc between an originating 
system state and a new system state produced when a component fails in the originating state. 
For the component that fails, the number of those components operational before the failure 
influences the transition rate of the arc. Thus, the number of components operational before 
the failure must be converted to a character string and printed. The INTCHR() subroutine 
encountered an error while attempting to convert this number of components to a character 
string. This error represents an internal programming error in FT2MC. 

Action: Report error to the first author of this Technical Paper. 
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**** FT2MC : ERROR F510 - TOO MANY STATES GENERATED, MAX = mstats 
File: FT2MC 

Subroutine: FT2MC 

Meaning: FT2MCQ has generated more than the maximum allowable number of states during 
the conversion of the fault tree model into a Markov chain. 

Action. Increase the MSTATS parameter in the tdrive source files and recompile the entire 
program. 

**** EXTID: ERROR F511 - STATE state _tuple NOT FOUND IN LINKED LIST 
File: FT2MC 

Subroutine: EXTID 

Meaning: As states are generated during the conversion process, they are stored in a linked 
list. This list makes it possible to determine whether each state has been generated before or 
is being generated for the first time. If the state has been generated before, it will have been 
assigned an external ID number when it was created. EXTID() looks up such a previously 
generated state in the linked list to determine its ID number. This error message occurs when 
a state that should be in the linked list is not found there. 

Action: Report error to the first author of this Technical Paper. 

**** FRSTIM : ERROR F512 - INCOMPATIBLE ROOT STATE (state_tuple) FOR 
STATE state^tuple 
File: CKSTAT 

Subroutine: FRSTIM 

Meaning: FRSTIM () uses a STATE and its parent state RSTATE in performing its function. 
At several places, FRSTIM () performs a consistency check between STATE and RSTATE to 
ensure that STATE is indeed a proper descendent state of RSTATE. If this consistency check 
fails (i.e., it is determined that STATE could not. possibly be a descendent state of RSTATE), 
then this error message is produced. This error represents an internal programming error in 

FT2MC. 

Action: Report error to the first author of this Technical Paper. 

**** RDDICT : ERROR F600 - DICTIONARY FILE NOT FOUND **** 

File: DICT 

Subroutine: RDDICT 

Meaning: The RDDICTQ subroutine was unable to find the dictionary file. 

Action: Make sure the dictionary file for the fault tree (modelname.DIC) exists before the 
FT2MC subsystem is called. 

**** RDDICT: ERROR F601 - DICTIONARY OVERFLOW, MAX NUMBER OF 
COMPONENT TYPES = num **** 

File: DICT 

Subroutine: RDDICT 
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Meaning: The RDDICT() subroutine found that the dictionary file for the fault tree contains 
more than the maximum number of component types allowed in a fault tree. 

Action: This fault tree cannot be run through FT2MC unless the FT2MC subsystem is rebuilt 
with a larger value for MTYPES, which is the limit for the maximum number of component 
types allowed in a fault tree. 

**** RDDICT: ERROR F602 - ERROR OPENING DICTIONARY FILE **** 

File: DICT 

Subroutine: RDDICT, WRTDCT 

Meaning: The RDDICT() subroutine encountered an error while trying to open the dictionary 
file. This error is an operating system error rather than an FT2MC error. 

Action: Consult the operating system manuals for the cause and possible solutions. 

**** RDDICT: ERROR F603 - UNEXPECTED EOL ENCOUNTERED WHILE PARSING NEXT 
ITEM FROM INPUT LINE, OFFSET = offset **** 

File: DICT 

Subroutine: RDDICT 

Meaning: The NXTWRD() subroutine encountered an unexpected End-Of-Line while reading 
an input line from the dictionary. The dictionary file may be corrupted. 

Action: Check the dictionary file. Recreate it if necessary. 

**** RDDICT: ERROR F604 - ERROR ENCOUNTERED WHILE PARSING NEXT ITEM FROM 
INPUT LINE, OFFSET = offset **** 

File: DICT 

Subrout ine : RDD I CT 

Meaning: The RDDICT() subroutine encountered an error while reading an input line from 
the dictionary. The dictionary file may be corrupted. 

Action: Check the dictionary file. Recreate it if necessary. 

**** RDDICT: ERROR F605 - COMPONENT ENTRIES OUT OF ORDER IN DICTIONARY **** 

File: DICT 

Subroutine: RDDICT 

Meaning: While reading the dictionary file, the RDDICTQ subroutine found that an entry 
for a component was not in consecutive order with the other entries. The dictionary file may be 
corrupted or the user may have made an error when creating the dictionary file. 

Action: Check the dictionary file. Recreate it if necessary. 

**** PASS1 : ERROR F700 - FAULT TREE DESCRIPTION FILE (filename) NOT FOUND **** 

File: BLDLST 

Subroutine: PASS1 
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Meaning: The PASS1Q subroutine could not find the graphics specification language input 
file containing the fault tree specification. 

Action: Make sure that an input file containing a fault tree specification (either a .FTR 
graphics format input file or a .TXT textual specification language format input file) exists 
before the FT2MC subsystem is called. 


**** PASS1 : ERROR F701 - UNEXPECTED FIRST INPUT ITEM - OBJECT TOO LONG **** 

File: BLDLST 

Subroutine: PASS1 

Meaning: The first item of every line in a graphics specification language format input file is 
either ’N’ (indicating that this line describes a fault tree node) or ’A’ (indicating that this line 
describes an arc). Either item is only one character in length. If the first, item read from an 
input line is longer than one character in length, then there is an error. The input file is either 
corrupted, or it is not a graphics specification language format input file. 

Action: Make sure the input file <ftreename.FTR> has the correct format (graphics 

specification language). 


**** PASS1 : ERROR F702 - ERROR OPENING INPUT FILE filename **** 

File: BLDLST 

Subroutine: PASS1 

Meaning: The PASSl() subroutine encountered an error while trying to open the input file 
containing the fault tree specification in graphics specification language format. This error is an 
operating system error rather than an FT2MC error. 

Action: Consult the operating system manuals for the cause and possible solutions. 

**** PASS1 : ERROR F703 - ERROR ENCOUNTERED WHILE PARSING NEXT WORD FROM 
INPUT LINE: line **** 

File: BLDLST 

Subroutine: PASS1 

Meaning: The NXTWRD0 subroutine encountered an error while reading an input line from 
the fault tree specification input file. The input file may be corrupted or contain errors. 

Action. Check the input file for format errors. Recreate it if necessary with tdrive or the 
graphics fault tree input facility. 


**** PASS1 : ERROR F704 - INDEX LIST OVERFLOW, TOO MANY FT NODES **** 

File: BLDLST 

Subroutine: PASS1 

Meaning. The fault tree specification read from the input file has so many nodes that it 
overflowed the index list table. 

Action: Rebuild the FT2MC subsystem with a larger index list size. The FT2MCO 
subroutine calls BLDLSTQ so that the QUEUE array is used for the Index List by BLDLST. 
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This double duty for the QUEUE array is possible because BLDLST does not use a queue and 
FT2MC does not use the index list after BLDLST returns. Therefore, space can be saved by 
using the same array for both purposes. Consequently, to increase the index list size, increase 
QLEN, the size of the QUEUE array, in the FT2MC() subroutine. 


**** PASS2 : ERROR F705 - UNEXPECTED FIRST INPUT ITEM - OBJECT TOO LONG **** 


File : BLDLST 

Subroutine: PASS2 


Meaning: The first item of every line in a graphics specification language format input file is 
either ’N’ (indicating that this line describes a fault tree node) or ’A’ (indicating that this line 
describes an arc). Either item is only one character in length. If the first item read from an 
input line is longer than one character in length, then there is an error. The input file is either 
corrupted, or it is not a graphics specification language format input file. 

Action: Make sure the input file <MODELNAME.FTR> has the correct format (graphics 
specification language). 


**** PASS2 : ERROR F706 - ILLEGAL FORM FOR M/N GATE LABEL: label **** 

File: BLDLST 

Subroutine: PASS2 

Meaning- This error occurs if the length of the token that is supposed to be a label for an 
M/N gate is less than three. An M/N gate label has the form: m/n, where m and n are integers. 
The label therefore must have a length of at least three. If it does not, then it cannot possibly 

be a valid M/N gate label. 

Action: Check the input file <ftreename.FTR> for an M/N gate label with an illegal format 
and correct it. 


**** PASS2 : ERROR F707 - TOO MANY BASIC COMPONENTS (LEAVES) IN FAULT TREE **** 

File: BLDLST 

Subroutine: PASS2 

Meaning: The fault tree specification read from the input file contains more Basic Component 
nodes than the maximum number allowed in a fault tree. 

Action: FT2MC cannot process this fault tree unless FT2MC is rebuilt with a larger value 
for MCMPTS, the maximum number of Basic Component nodes allowed in the fault tree. 


**** PASS2 : ERROR F708 - ILLEGAL FAULT TREE NODE TYPE: nodetype **** 

File: BLDLST 

Subroutine: PASS2 

Meaning: A fault tree node type read from the input file is not one of the supported types 
defined in the GATES COMMON block. The value for <nodetype> printed as part of the 
previous message is an integer value. 

Action: Check the input file <MODELNAME.FTR> for an error in one of the fault tree 
node description lines. 
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**** PASS2 : ERROR F709 - LOOKUP IN INDEX LIST FAILED FOR ITEM AT 
LOCATION (x , y) **** 

File: BLDLST 

Subroutine: PASS2 


Meaning: The PASS2() subroutine could not find an entry in the index list for 
fault tree nodes. Tins error represents an internal programming error in PASSl and 

Action: Report error to the first author of this Technical Paper. 


one of the 
PASS2. 


**** PASS2 : ERROR F710 - DEST COORD MISMATCH FOR INCOMING ARC **** 
File: BLDLST 

Subroutine: PASS2 


Meaning: The graphics fault tree input facility records both incoming and outgoing ares for 
u tree nodes in the fault tree specification file that it produces for FT2MC. An incoming arc 

the arc TheTstin, t d "7 ^ ° f the arC and the node at the destination of 

arc. The destination node for an incoming arc is the node currently being processed An 

outgoing arc description specifies only the node at the destination of the arc. FT2MC only needs 

to concern itself with incoming arcs. FT2MC can determine whether an arc description is for an 

, r ! g arC ° r an ° utgoin S arc by looking for the destination node to be the same as the node 
rren y eing processed. If it is not and no destination node is specified, then the arc is an 
ou going arc and can be ignored. However, if the destination node is specified and it is not the 
same as the node currently being processed, then there is an error in the fault tree specification 
This error message is produced in that event. 1 

Action: Check the fault tree specification in the input file <MODELNAME.FTR> and 

LUI I cLl ll , 


**** PASS2: ERR0R R 711 - ERROR OPENING INPUT FILE filename **** 
File: BLDLST 

Subroutine: PASS2 


Meaning. The PASS2Q subroutine encountered an error while trying to open the input file 
ontammg the fault tree specification in graphics specification language format. This error is an 

operating system error rather than an FT2MC error. trror is an 

Action: Consult the operating system manuals for the cause and possible solutions. 


**** PASS2 : ERROR F712 - ERROR PARSING NEXT WORD FROM INPUT LINE: line **** 

File: BLDLST 

Subroutine: PASS2 


the^lt^ The N fi XTWRD( ) subr o u tin e encountered an error while reading an input line from 
the fault tree specification input file. The input file may be corrupted or contain errors. 

Actmn: Check the input file for format errors. Recreate it if necessary with tdnve or the 
graphics fault tree input facility. 
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**** LOOKUP: ERROR F713 - ILLEGAL FUNCT: f **** 


File: 


BLDLST 


Subroutine : LOOKUP 

Meaning: The value of / in the previous message is an integer. The LOOKUP() ^routine 
looks up a fault tree node in the index list and allows the calling routine to either read or write 
to the pointer field (points into place table) of the index list entry for the fault tree node^The 
calling routine specifies which operation it wants to do through a subroutine argument FUNC 
where FUNCT = 0 means read and FUNCT = 1 means write. Any other value of FUNCT is 
unsupported and produces this error message. This error represents a programming error wi in 

PASS2. 

Action: Report error to the first author of this Technical Paper. 


.... LOOKUP: ERROR F714 - PLACE AT (x.y) »0T FOUND IN INDEX LIST .... 
File : BLDLST 

Subroutine: LOOKUP 


Meaning: The 
fault tree nodes. 


LOOKUP() subroutine could not find an entry in the index list for ™ e ^ the 
This error represents an internal programming error in PASS1 and PASS2. 


Action: Report error to the first author of this Technical Paper. 


**** PASS3: ERROR F715 - CARDINALITY OF INCOMING ARCS (c) FOR m/n GATE 
DO NOT MATCH N = n of M/N GATE **** 

File: BLDLST 


Subroutine: PASS3 

Meaning: The values of c, m, and n in the previous message are integers. The fault tree 
specified in the modelname.FTR input file contained an m/n gate whose number of incoming 
arcs did not match the parameter n of the gate. Compound arcs (i.e., arcs whose sources are basic 
component nodes representing several redundant components) count as several individual arcs 
raZ Tan “ one arc. For example, a compound incoming arc whose source node is a basic 
component node representing three redundant components of type 1 (a 3 1 basic componen 
node) counts as three individual arcs rather than as one arc. 

Action Examine the MODELNAME.FTR or the MODELNAME.TXT input file and correct 
any m/n gates that do not have exactly n incoming arcs (considering compound arcs as described 

previously). 


**** PASS3: ERROR F716 - DEPENDENT EVENTS FOR A FUNCTIONAL DEPENDENCY 
GATE MUST BE BASIC EVENTS; ARC arc OF NODE node 
IS NOT A BASIC EVENT 
File : BLDLST 

Subrout ine : PASS3 

Meaning: Functional Dependency gates can have only basic event nodes as dependent events 
(i e all events after the first, or leftmost, incoming arc to the gate). The trigger event (firs , o 
leftmost, incoming arc) can be any type of event (i.e., the trigger arc may come from any legal 
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type of gate or node). If any dependent events of the Functional Dependency gate are not basic 

events (i.e., the arc does not come from a basic component node), then this error message is 
printed. 

Action: Examine the MODELNAME.FTR or the MODELNAME.TXT input file and correct 
any Functional Dependency gates with dependent events that are not basic events. 

**** PASS3: ERROR F717 - ALL DESCENDENT EVENTS FOR A COLD SPARE GATE MUST 
BE UNREPLICATED BASIC EVENTS; ARC arc OF NODE node 
IS NOT A BASIC EVENT 
File: BLDLST 

Subroutine: PASS3 

Meaning: Cold Spare gates can have only unreplicated basic event nodes as descendent events 
(i.e., all incoming arcs must come from basic component nodes). If any descendent events of the 
Cold Spare gate are not basic events, this error message is printed. 

Action: Examine the MODELNAME.FTR or the MODELNAME.TXT input file and correct 
any Cold Spare gates with descender.! events that are not basic events. 

**** PASS3: ERROR F718 - PRIORITY-AND GATES MUST HAVE 2 INCOMING ARCS; 

NODE node HAS numarcs ARCS 
File: BLDLST 

Subroutine: PASS3 

Meaning: Priority And gates must have exactly two incoming arcs. If a Priority And gate 
as any number of incoming arcs other than two, this error message is printed. 

Action: Examine the MODELNAME.FTR or the MODELNAME.TXT input file and correct 
any Priority And gates that have other than exactly two incoming arcs. 

**** PASS3 : ERROR F719 - ALL DESCENDENT EVENTS FOR A COLD SPARE GATE MUST 
BE UNREPLICATED BASIC EVENTS; ARC arc OF NODE node 
IS A REPLICATED BASIC EVENT 
File: BLDLST 

Subroutine: PASS3 

Meaning: Cold Spare gates can have only unreplicated basic event nodes as descendent events 
R.e all incoming arcs must come from basic component nodes). If any descendent events of the 
Cold Spare gate are replicated basic events, this error message is printed. 

Action: Examine the MODELNAME.FTR or the MODELNAME.TXT input file and correct 
any Cold Spare gates with descendent events that are replicated basic events. 

**** PASS3 : ERROR F720 - COLD SPARE GATES SHARING A SPARE WITH 
OTHER COLD SPARE GATE(S) MUST HAVE 2 
INCOMING ARCS; NODE node HAS numarcs ARCS 
File: BLDLST 

Subroutine: PASS3 
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Meaning: Cold Spare gates whose spare (dependent) component is shared with another Cold 
Spare gate are restricted to having only one spare. The gate can therefore have only two 
incoming arcs — one for the primary component and one for its spare (each of these must be an 
unreplicated basic event, as described for error F719). If a Cold Spare gate that shares a spare 
with any other Cold Spare gate(s) has any number of incoming arcs other than two, this error 
message is printed. NOTE: Cold Spare gates that do not share any spares with other Cold Spare 
gates are not subject to this restriction; they may have any number of incoming arcs up to the 
maximum (specified by the MARCS parameter). 

Action: Examine the MODELNAME.FTR or the MODELNAME.TXT input file and correct 
any Cold Spare gates with shared spares that have other than exactly two incoming arcs. 

**** PASS3 : ERROR F721 - ALL DESCENDENT EVENTS EXCEPT THE LEFTMOST 
EVENT OF A SEQUENCE GATE MUST BE BASIC EVENTS; 

ARC arc OF NODE node IS NOT A BASIC EVENT 
File : BLDLST 

Subroutine: PASS3 

Meaning: Sequence gates can have only (possibly replicated) basic event nodes as descendent 
events (i.e., all incoming arcs must come from basic component nodes) except for the leftmost 
descendent event, which can be any general event. If any descendent events of a Sequence gate 
other than the leftmost are not basic events, this error message is printed. 

Action: Examine the MODELNAME.FTR or the MODELNAME.TXT input file and correct 
any Sequence gates with descendent events that are not basic events. 

**** TRVTRE : ERROR F800 - TABLE TOO SMALL TO HOLD STACK **** 

File: TRVTRE 

Subroutine : TRVTRE , DEPCHK , CSPCHK , PACHK 

Meaning: The TRVTRE () subroutine uses a stack to simulate a recursive traversal of the 
fault tree. This ; tack is stored at the end of the place table constructed by BLDLSTQ. If 
TRVTRE () finds that the stack it needs is too big to fit onto the end of the place table array, 

this error message is produced. 

Action: FT2MC cannot process this fault tree unless FT2MC is rebuilt with a larger value 
for FTLEN, which is the size of the array containing the place table. 

**** TRVTRE: ERROR F801 - ILLEGAL PLACE TYPE (nodetype) FOR PLACE node AT 
OFFSET offset IN PLACE TABLE **** 

File: TRVTRE 

Subrout ine : TRVTRE , DEPCHK , CSPCHK , PACHK 

Meaning: The values for <nodetype>, <node>, and <offset> in the previous message are 
all integers. The TRVTREQ subroutine detected an illegal fault tree node type stored in the 
place table. Thus, the place table is probably corrupted. This error represents an internal 
programming error in BLDLST and/or TRVTRE. 

Action: Report error to the first author of this Technical Paper. 
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**** TRVTRE : ERROR F806 - INTERNAL ERROR: PLACE TABLE OFFSET < 0 ( neg val ) 
File: TRVTRE 

Subrout ine : TRVTRE , DEPCHK , CSPCHK , PACHK 


Meaning: This internal 
incorrectly handled. 


error indicates that a fault tre 


e arc has been flagged as severed and 


Action: Report error to the first author of this Technical Paper. 


**** TRVTRE: ERROR F807 - CSP GATE AT OFFSET place IN PLACE TABLE 
NOT FOUND IN TABLE OF CSP GATES (CSPTAB) 

File: TRVTRE 


Subrout ine : TRVTRE , DEPCHK , PACHK , SEQCHK 

* C<>W Spim ' *** - - 

Action: Report error to the first author of this Technical Paper. 


**** CSPCHK: ERROR F808 - LINKED LIST OF CSP GATE DESCENDENTS 
(CSPRNT or REPEAT) IS CORRUPTED: 

COMPONENT cmpnt IS NOT A DESCENDENT 
OF CSP GATE gate AT LOCATION cspoff 
IN PLACE TABLE 
File: TRVTRE 

Subroutine: CSPCHK , DETUSD 


of a Cold ^ f intCrnH i” rr0r 1Ildlcatcs that a component that was supposed to be a descendent. 
of a Cohl Spare gate accordmg to ether the REPEAT linked list or one of the CSPRNTQ linked 

f , , . m f ; lCt found not to be a descendent of the specified Cold Spare gate according to the 
fauh tree data structure (the Place Table). The two internal data structures therefore do not 


Action: Report error to the first author of this Technical Paper. 


**** CSPCHK: ERROR F809 - COMPONENT cmpnt HAS SEVERAL CSP GATE 
PARENTS, INCLUDING CSP GATE gate 
(AT LOCATION cspoff IN THE PLACE 
TABLE) WHICH IS SUPPOSED TO HAVE 
NO SHARED SPARES 


File: TRVTRE 

Subroutine: CSPCHK 


sevemfcofd o ' ' “ error occurs when a component is supposed to be a descendent of 

several Cold Spare gates (accordmg to the CSPRNT() array of linked lists) and one of these 
d Spare gates is not supposed to share any of its spares with any other Cold Spare gate 
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according to the SHRDSP field of its entry in the fault tree data structure (the Place Table). 
By definition, the component has to be shared between the several Cold Spare gates to be their 
descendent; thus, the Cold Spare gate is either incorrectly marked as not sharing spares, or the 
CSPRNTQ data structure is corrupted. 

Action: Report error to the first author of this Technical Paper. 

**** DETUSD: ERROR F810 - CANNOT TELL HOW MANY OF COMPONENT TYPE 
compnt ARE BEING USED BY CSP GATE gate 
(AT LOC cspoff IN THE PLACE TABLE) 

File: TRVTRE 

Subroutine: DETUSD 

Meaning: This internal error occurs when subroutine DETUSD0 is called to try to determine 
how many components of a basic event are on-line and in use by a Cold Spare gate that does 
not share any of its spares with any other Cold Spare gate. Cold Spare gates that share spares 
with other Cold Spare gates have a “components-in-use” descriptor in the state tuple. Cold 
Spare gates that do not share any of their spares do not have such a descriptor in the state 
tuple. Since DETUSDQ determines how many of a components are in use by examining the 
appropriate descriptor, it cannot determine how many of the requested components are in use 
(because there is no descriptor to examine). Since DETUSD() should never be called for a Cold 
Spare gate that does not share its spares, this error is an internal error. 

Action: Report error to the first author of this Technical Paper. 

**** CVRTXT : ERROR F900 - FAULT TREE NAME TOO LONG **** 

File : CVRTXT 

Subroutine: CVRTXT 

Meaning: The modelname passed to CVRTXT () was too long and was rejected. 

Action: Specify a shorter modelname. 

**** CVRTXT: ERROR F901 - ’’NODE" MISSING ON INPUT LINE line **** 

File: CVRTXT 

Subroutine: CVRTXT 

Meaning: A syntax error occurred in an input line read from the MODELNAME.TXT file; 
the offending line is printed for the user’s inspection. The keyword “NODE” is not present where 
it should be in the input line. 

Action: Edit the input file MODELNAME.TXT (containing the textual description of the 
fault tree) and fix the syntax error in the appropriate line. 

**** CVRTXT: ERROR F902 - TOO MANY NODES, MAX NO. NODES ALLOWED = maxnodes **** 
File: CVRTXT 

Subroutine: CVRTXT 

Meaning: The fault tree described in the MODELNAME.TXT input file contains too many 
nodes. 
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Action: Rebuild the FT2MC subsystem with a larger value for the maximum number of 
fault tree nodes allowed; that is, increase the MNODES parameter in all FORTRAN source 
files, recompile, and relink. 


**** CVRTXT : ERROR F903 - NODE NUMBER MISSING ON INPUT LINE line **** 

File : CVRTXT 

Subroutine: CVRTXT 

Meaning: A syntax error occurred in an input line read from the MODELNAME TXT file- 
the offending line is printed for the user’s inspection. The number that identifies a node is not 
present where it should be in the input line. 

Action: Edit the input file MODELNAME.TXT (containing the textual description of the 
fault tree) and fix the syntax error in the appropriate line. 

**** CVRTXT: ERROR F904 - "TYPE" MISSING ON INPUT LINE line **** 

File: CVRTXT 

Subroutine: CVRTXT 

Meaning: A syntax error occurred in an input line read from the MODELNAME.TXT file- 
the offending line is printed for the user’s inspection. The keyword “TYPE” is not present where 
it should be in the input line. 

Action: Edit the input file MODELNAME.TXT (containing the textual description of the 
rault tree) and fix the syntax error in the appropriate line. 


**** CVRTXT: ERROR F905 - TOO MANY BASIC COMPONENTS, MAX NO. = maxcmpts **** 
File : CVRTXT 

Subrout ine : CVRTXT 


Meaning: The fault tree described in the MODELNAME.TXT input file contains too 
basic component nodes. 


many 


Action: Rebuild the FT2MC subsystem with a larger value for the maximum number of basic 
component nodes allowed; that is, increase the MCMPTS parameter in all FORTRAN source 
hies, recompile, and relink. 


**** CVRTXT: ERROR F906 - "OF" MISSING ON INPUT LINE line **** 

File: CVRTXT 

Subroutine: CVRTXT 

Meaning: A syntax error occurred in an input line read from the MODELNAME TXT file- 

the offending line is printed for the user’s inspection. The keyword “OF” is not present where 
it should be in the input line. 

Action: Edit the input file MODELNAME.TXT (containing the textual description of the 
iault tree) and fix the syntax error in the appropriate line. 

**** CVRTXT: ERROR F907 - "COMPONENT" MISSING ON INPUT LINE line **** 

File : CVRTXT 

Subroutine: CVRTXT 
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AT anincr* A W nt.s« error occurred in an input line read from the rnodelname.TXT file; the 
offemtoig'line^s printwi^r'th^i^er's inspection. The keyword “COMPONENT” is not present 
where it should be in the input line. 

Action: Edit the input file MODELNAME.TXT (containing the textual description of the 
fault tree) and fix the syntax error in the appropriate line. 

**** CVRTXT : ERROR F908 - "INPUT" MISSING ON INPUT LINE line **** 

File: CVRTXT 

Subroutine: CVRTXT 

Meaning: A syntax error occurred in an input line read from the ’ 

the offending line is printed for the user's inspection. The keyword INPUT not present 

where it should be in the input line. 

Action: Edit the input file MODELNAME.TXT (containing the textual description of the 
fault tree) and fix the syntax error in the appropriate line. 

**** CVRTXT: ERROR F909 - SOURCE NODE NUMBER MISSING ON INPUT LINE line **** 

File: CVRTXT 

Subroutine: CVRTXT 

Meaning- A syntax error occurred in an input line read from the MODELNAME.TXT file; 
the offending line is printed for the user's inspection. The number that identifies the node at 
the source of an input arc is not present where it should be in the input hn . 

Action: Edit the input file MODELNAME.TXT (containing the textual description of the 
fault tree) and fix the syntax error in the appropriate line. 

**** CVRTXT: ERROR F910 - "LABEL" MISSING ON INPUT LINE line **** 

File: CVRTXT 

Subroutine: CVRTXT 

Meaning: A syntax error occurred in an input line read from the 
the offending line is printed for the user’s inspection. The keyword LABEL is not present 

where it should be in the input line. 

Action: Edit the input file MODELNAME.TXT (containing the textual description of the 
fault tree) and fix the syntax error in the appropriate line. 

**** CVRTXT: ERROR F911 - NUMBER INCOMING ARCS MISSING ON INPUT LINE line **** 

File: CVRTXT 

Subroutine: CVRTXT 

Meaning: A syntax error occurred in an input line read from the MODELNAME.TXT file; 
the offending line is printed for the user’s inspection. The number of incoming arcs for a fault 
tree node is not present where it should be in the input line. 

Action: Edit the input file MODELNAME.TXT (containing the textual description of the 
fault tree) and fix the syntax error in the appropriate line. 
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**** CVRTXT : ERROR F912 - TOO MANY INCOMING ARCS, MAX NO. = maxarcs **** 
File: CVRTXT 

Subroutine: CVRTXT 


Meaning: The fault tree described in the MODELNAME.TXT input file contains one node 
that has too many incoming arcs. 

Action: Rebuild the FT2MC subsystem with a larger value for the maximum number of 
incoming arcs per node allowed; that is, increase the MARCS parameter in all FORTRAN 
source files, recompile, and relink. 


**** CVRTXT: ERROR F914 - ERROR ENCOUNTERED READING SOURCE OF INCOMING 
ARC arc IN LINE: line **** 

File: CVRTXT 

Subrout ine : CVRTXT 

Meaning: The NXTWRD() subroutine encountered an error while reading the number of the 
node at the source of an incoming arc. 

errors tl0n ExaminC the offendin S in P ut line in the MODELNAME.TXT input file for syntax 


**** CVRTXT: ERROR F915 - ILLEGAL TYPE FOR FAULT TREE NODE — > type **** 
File: CVRTXT 

Subroutine: CVRTXT 


Meaning: A syntax error occurred in 
the offending line is printed for the user’ 
whose type is unsupported. 


an input line read from the MODELNAME.TXT file; 
s inspection. The input line specifies a fault tree node 


Action: Edit the input file MODELNAME.TXT (containing the textual 
tault tree) and fix the syntax error in the appropriate line. 


description of the 


**** CVRTXT: ERROR F916 - "SYSTEM-FAILURE" BOX NOT SPECIFIED, FAULT TREE 
INCOMPLETE **** 


File: CVRTXT 

Subrout ine : CVRTXT 


Meaning: The fault tree description contained in the MODELNAME.TXT file does not 
include a “System-Failure” box (FBOX) at the top node (root) of the fault tree. The FT2MC 
subsystem requires all fault trees to have an FBOX or they cannot be processed. 

E f in P^ e MODELNAME.TXT (containing the textual description of the 
fault tree) and add an FBOX node at the top of the fault tree. 


**** CVRTXT: ERROR F917 - ERROR OPENING TEXT DESCRIPTION FILE filename **** 
File: CVRTXT 

Subroutine: CVRTXT 
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Meaning: The CVRTXTQ subroutine encountered an error while tryingtoope" the text 
description file. This error is an operating system error rather than an FT2MC error. 

Action: Consult the operating system manuals for the cause and possible solutions. 


**** CVRTXT : ERROR F918 - ERROR OPENING FAULT TREE FILE filename **** 

File: CVRTXT 

Subroutine: CVRTXT 

Meaning: The CVRTXT() subroutine encountered an error while trying to open the fault 
tree file. This error is an operating system error rather than an FT2MC error. 

Action: Consult the operating system manuals for the cause and possible solutions. 


***♦ CVRTXT: ERROR F919 - NXTWRDO ENCOUNTERED ERROR err PARSING NEXT 


ITEM, OFFSET = offset **** 


File: CVRTXT 

Subroutine: CVRTXT 


Meaninr The NXTWRDO subroutine encountered an error while parsing the next item on 
thetpuThne from the text description hie. The offset within the input line where the error 

occurred is printed. 


Action: Check the text description file. Edit it if necessary to correct any errors 


**** INPTRE: ERROR F920 - FAULT TREE NAME TOO LONG **** 

File : INPTRE 

Subroutine: INPTRE 

Meaning. The modelname passed to INPTRE() was too long and was rejected 
Action: Specify a shorter modelname. 


**** INPTRE: ERROR F921 - RDDICTO RETURNED ERROR err WHILE TRYING TO 
READ DICTIONARY FILE **** 

File : INPTRE 

Subroutine: INPTRE 

Meaning- The RDDICT() subroutine returned an error while trying to read the dictionary file. 
Several conditions may cause this message. Look at the error messages that occur immediately 
before this message to determine the cause. 

Action: Report error to the first author of this Technical Paper. 


**** INPTRE: ERROR F922 - ERROR OPENING DICTIONARY FILE filenams **** 

File : INPTRE 

Subroutine: INPTRE 

Meaning: The INPTRE() subroutine encountered an error while trying to open the dictionary 
file. This error is an operating system error rather than an INPTRE error. 

Action: Consult the operating system manuals for the cause and possible solutions. 
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**** INPTRE : ERROR F923 - ERROR OPENING TEXT DESCRIPTION FILE filename **** 

File: INPTRE 

Subroutine : INPTRE 

Meaning: The INPTRE() subroutine encountered an error while trying to open the text 
description file. This error is an operating system error rather than an INPTRE error. 

Action: Consult the operating system manuals for the cause and possible solutions. 

**** CVRTXT : ERROR F940 - UNEXPECTED OR INVALID TOKEN (token) ON LINE: 
line 

File: CVRTXT 

Subroutine: CVRTXT 

Meaning. An unexpected or invalid token was detected during the parsing of the indicated 
line from the .TXT input file. 

Action: Check the text description file. Edit it if necessary to correct any error(s). 

+++ WARNING 120: CANNOT FIND THE PARAMETER ************* 

IN THE DICTIONARY. RESULTS MAY BE INCORRECT. 

File: COVS 

Subroutine: NEWREP 

Meaning: The user has specified no repair in the model. However, there are rates in the 
model not included in the dictionary. 

Action: To insure that the results are correct, run the program again and respond yes when 
questioned about repair. 


+++ WARNING 1110: ASSUMING FIRST STATE ENCOUNTERED IN THE . INT 
FILE TO BE THE INITIAL STATE OF THE MODEL! 

File: FIFACE 

Subroutine; MAIN 

Meaning: The first line of the .INT file reads either SORTED or UNSORTED. The next line 
begins the actual Markov chain entries of the form STATE 1 STATE2 RATE; regardless of the 
initial FORM type (fault tree or Markov chain), the first state listed (i.e., STATE!) must be 
the initial state of the system. HARP gives the initial state a probability of 1.0. This warning 
is only given for UNSORTED input. 

Action: If the first state listed is not the initial state, edit the .INT file so that it is the first 
state. 


+++ WARNING 1120: ONLY 96 COMPONENTS ALLOWED ALL OTHERS ARE IGNORED’ 
files: FIFACE 

Subroutine: READIC 
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Meaning: Only 96 components are acknowledged by HARP. Any additional component types 
will not be read into the data structure. 

Action: If the model has more than 96 component types, HARP will not run. The model 
must be reconstructed with fewer component types. 

+++ WARNING 1130: ************** IS GREATER THAN 12 CHARACTERS 

IT WILL BE TRUNCATED TO: ************ 
files: FIFACE 

Subroutine: READIC 

Meaning: Component word length in the dictionary is restricted to 12 characters. This 
restriction does not affect the outcome of the program. 

+++ WARNING 1135: EXTRANEOUS LINE FOUND AT END OF DICTIONARY FILE: line 
files: FIFACE 

Subroutine: READIC 

Meaning: At the bottom of the dictionary file are listed the state id numbers of all the FEn 
(failure due to exhaustion) states and/or the TAn (truncation aggregation) states. There should 
be no other lines of data after these state id numbers. If any other lines are found following 
them, this warning message is printed. 

Action: If the model has more than 96 component types, HARP will not run. The model 
must be reconstructed with fewer component types. 

+++ WARNING 1150 - CAN’ ’T PARSE *******, NCF RATES MAY NOT BE CONSERVATIVE 

File: PARSE, SUMCOF 

Subroutine: PARSE, ALLSET 

Meaning: The string passed to the parser cannot be converted to a numerical value. 

Therefore, the NCF rates will be calculated only by the arcs emanating from the target state 
and may not be conservative. 

Action: To insure conservative rates, rates should be of the form coefficient * dictionary .rate. 
A single rate may also be a numerical value or a repair rate. 

+++ WARNING 1170 - CAN* *T FIND OVERRIDING FEHM FILENAME FOR LINE , WHERE 
File : NXTFLT 

Subroutine: HIRFND 

Meaning: HARP cannot find the overriding FEHM file that was declared in the .INT file. 

Action: Send copies of all input files along with the version of the program to the first author 
of this Technical Paper. 

**** WARNING 1190: USE OF OVERRIDING FEHMs 

File: COVS 

Subroutine: TO ADD 
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Meaning: Overriding FEHM's are being used in a model where there may be more than one 
rate going from the same source state to the same destination state on the same input line 
The program Jiface cannot accommodate this situation. If there is only one rate, continue by 
answering yes when prompted to continue. 

Action: List the rates explicitly for each transition as follows: 

YES: 1 2 3*LAM:NEW; 

1 2 2*MU:OLD; 

NO: 1 2 3 * L A M : N E W + 2 * M U : O L D ; 


*** ERROR 1510: ALLOWABLE NUMBER OF TRANSITIONS 
EXCEEDED FOR SORTED INPUT 
File: LD 

Subroutine: LDSORT 


n„„fr"TP ""'"'’'T, 0 ' sllow » bte *» the program has hccn excoo,U,l. This 

number, 1KSIZ, is originally set to 10000. 

Action: Change the value of TRSIZ in routine INITSZ in fiface.for, recreate the object 
module, and recompile the program. 


*** ERROR 1511: ALLOWABLE NUMBER OF TRANSITIONS 

EXCEEDED FOR UNSORTED OR SYMBOLIC INPUT 
File: LD 

Subroutine: L0DFIL 

numb^MCT^Z of alIowabl( ‘ transitions for unsorted input has been exceeded. This 

number, MCTRZ, is originally set to 2050. 

Action: Change the value ot MCTRZ in routine INITSZ in filacc.for, recreate the object 

rTher °/h TT NotC ' H w ° uW ''«■ '»•«« to sort the input (or use numeric 

rather than symbolic input) to reduce the run time. 


*** ERROR 1520: STATE SIZE EXCEEDED FOR SORTED INPUT 
File: LD 

Subroutine: LDSORT 

ST^T7 aning: - The numher of allowable states for the program has been exceeded. This number 
SISIZ, is originally set to 1000. ’ 

Action: Change the value of STSIZ in routine INITSZ in fiface.for, recreate the object module 
and recompile the program. J ’ 


*** ERROR 1521: STATE SIZE EXCEEDED FOR 
UNSORTED OR SYMBOLIC INPUT 
File: LD 

Subroutine: STNUM 
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Meaning: The number of allowable states for unsorted input has been exceeded. This number, 
MCSTZ, is originally set to 500. 


Action: Change the value of MCSTZ in routine INITSZ in fiface.for, recreate the object 
module, and recompile the program. Note, it would be better to sort the input (or use numeric 
rather than symbolic input) to reduce the run time. 


*** ERROR 1530: PARAMETER EXCEEDS ALLOWABLE' 
SIZE OF ' .PSIZE-9, ’ TRUNCATING.’ 

File: LD 

Subroutine: LDP ARM 


Meaning: The variable PSIZE, set in routine INITSZ of file fifaee.for has been exceeded. 
The rate parameter is set to 32 with 9 characters being allowed for the coefficient 12 for 
the rate symbol, and the rest for coverage and the multiplicative signs and semicolon (i.e 
1.2345678*RATPARAMETER*C1234567;). The program actually allows only PblZb-9, so 
can add the coverage factor without exceeding the size of 32. 


Action: The value of PSIZE can be changed in routine INITSZ. However, the HARP engine 
also has a hard limit of 13 characters for the rate symbol. Also, change the data structure tor 
PARMS and MCPARM (currently set to 32). 


*** FIFACE: ERROR 1610 - MISSING KEYWORD "UNSORTED" OR SORTED 

IN THE .INT FILE. SHOULD BE THE FIRST LINE. 

File: FIFACE 

Subroutine: MAIN 


Meaning: The first line of the .INT file must be one of two keywords: UNSORTED or 
SORTED. 

Action: Edit the .INT file so that the first line reads SORTED if the .INT file is s a ^converted 
fault tree or a Markov chain without symbolic input and in row-wise order or UNSORTED 
is a Markov chain with symbolic input or not in row-wise order. 


*** READIC: ERROR 1620 - ERROR IN DICTIONARY FILE LINE: 

File: FIFACE 

Subroutine: READIC 

Meaning: There must be four entries on the dictionary line in the following format. 

1 COMPONENT RATE FEHM 

The program lists the offending line that has either fewer than or more than 4 entries. FEHM 
may be a filename or the keyword NONE or VALUES. 

Action: Edit the .DIC file so that it conforms to the above rules. 


*** RDIDS : ERROR 1625 - NUMBER OF FEIDS/TAIDS DOES NOT MATCH 
THE NUMBER OF COMPONENT TYPES IN THE DICTIONARY 
File: FIFACE 

Subroutine: RDIDS 
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Meaning: At the bottom of the dictionary file are listed the state id numbers of all FEn 
(failure due to exhaustion) states and/or TAn (truncation aggregation) states. Since there can 
be one FE/TA state for each type of component in the system, the number of FEn and/or TAn 
state ids should be the same as the number of dictionary entries (which define the types of 
components in the system). If there are not the same number of FEn or TAn state ids listed 
than as component types defined in the dictionary file, this error message is produced. 

Action: The .DIG file is probably corrupted. Rerun tdrive to recreate it. 

*** ERROR 1710: A ZERO ROW IN ROUTINE ORDER 
File: TRANSPOSE 

Subrout ine : ORDER 

Meaning: During the transposition, the routine was trying to set the zeroth entry of an array. 
Action: Check the input file for an error. 

*** ERROR 1725: ERROR IN EXPRESSION, CHAR 
File: PARSE 

Subrout ine : TOCNVT 

Meaning: A character has been found that should not be in the expression. 

Action: Edit the input file to remove the offending sequence. 

*** ERROR 1730: ILLEGAL CHARACTER, ♦ ***, IN EXPRESSION 
File: PARSE 

Subrout ine : 0PRT0R 

Meaning. A character has been found that should not be in the expression. 

Action. Edit the input file to remove the offending sequence. 

*** ERROR 1735: STACK IS EMPTY 
File: PARSE 

Subroutine: POP 

Meaning: During parsing an attempt has been made to pop a value off the empty stack. 

Action: If a reason for the error cannot be found, then send a copy of the input files and 
version number to the first author of this Technical Paper. 


++++ CONVRT : WARNING U100 - CHAR WORD TO BE CONVERTED TO NUMERIC 
CONTAINS NO DIGITS 
File : TFHUTL 

Subroutine: CONVRT 

Meaning: CONVRT() converts a character string representation of a real number into its 
numeric data type representation. If the character string that is purported to contain a real 


127 



number in fact contains no digits, this warning is produced. This error probably represents an 
internal programming error in tdrive. 

Action: In the calling routine, check for a nonnumeric character string being passed to 

CONVRT(). 

++++ CONVRT : WARNING U101 - EXPONENT CONTAINS NO DIGITS > string 
File: TFHUTL 

Subroutine: CONVRT 

Meaning- CONVRTQ converts a character string representation of a real number into its 
numeric data type representation. If the character string that is purported to contain a real 
number contains exponential notation and the exponent contains no digits, this warning message 

is produced. 

Action: In the calling routine, check for an invalid numeric character string being passed to 
CONVRTQ. 

++++ CONVRT: WARNING U102 - INVALID CHARACTERS: characters DETECTED DURING 
CONVERSION OF string FROM CHAR TO NUMERIC 
File: TFHUTL 

Subroutine: CONVRT 

Meaning- CONVRT() converts a character string representation of a real number into its 
numeric data type representation. If the character string that is purported to contain a real 
number contains nonnumeric characters, this warning message is produced. 

Action: In the calling routine, check for an invalid numeric character string being passed to 
CONVRTQ. 


++++ DBCHR : WARNING U104 - ... ROUNDING TO ZERO 

File: TFHUTL 

Subroutine: DBCHR 


Meaning: DBCHR converts a double precision number (RNUM) to a character representa- 
tion. To convert the decimal portion of the number, at each iteration the number (RNUM) is 
multiplied by 10 and the integer value subtracted (RNUM = RNUM - INT(RNUM)). Once 
RNUM is less than EPSIL, RNUM is rounded to zero. 

Action: If this is not satisfactory, change the value of EPSIL. It is initially set to 1.0e-6. 


**** ALLOC: ERROR U400 - BOUNDS OF MEMORY POOL EXCEEDED 
File : DYNMEM 

Subrout ine : ALLOC , PSHPTR 

Meaning: ALLOCQ allocates regions of a large buffer array for use in linked list type 
applications (emulates a simple dynamic memory facility). If an attempt is made to allocate 
space beyond the end of the buffer array, this error message is produced. 
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Action: Increase the size of the buffer array in the calling routine where it is defined (in 
this case, increase the MPLEN or DMPLEN parameter, whichever applies, in FT2MC()) and 
recompile the entire program. 

**** POPPTR: ERROR U401 - STACK UNDERFLOW 
File: DYNMEM 

Subroutine: POPPTR 

Meaning: An underflow situation (stack is empty) was encountered while trying to pop an 
item from a stack. This error represents an internal programming error in the tdrive program. 

Action: Report error to the first author of this Technical Paper. 

**** GETNOD: ERROR U402 - NODSIZ nodesize IS > MAX NODE SIZE maxmodesize 

File: DYNMEM 

Subrout ine : GETNOD , DSPNOD 

Meaning: While requesting the allocation of memory for a node from the dynamic memory 
emulation routines, the caller has requested a node size greater than the declared maximum 
a lowable node size. This error represents an internal programming error in the tdrive program. 

Action: Report error to the first author of this Technical Paper. 

**** GETLIN: ERROR U500 - INPUT LINE TOO LONG, MUST BE <= vldlen CHAR'S 
File: TFHUTL 

Subroutine : GETLIN 

Meaning: GETLINQ reads a line from an input file and checks that the line is not longer 
than a certain valid length. If such a line is longer than the program is expecting, the program 
simply truncates the input line to the valid length, losing some of the input. When this occurs 
this error message is produced. 

Action. Ensure that the input, file contains data of the proper format expected by the 
program. 

**** GETLIN: ERROR U501 - ERROR ENCOUNTERED READING INPUT LINE FROM FILE 
File: TFHUTL 

Subroutine: GETLIN 

Meaning: GETLINQ encountered a read error while trying to read a line from an input file. 
This error represents an operating system error rather than a GETLINQ error. 

Action: Consult the operating system manuals for the cause and possible solutions. 

**** ADD2Q: ERROR U601 - QUEUE OVERFLOW **** 

File: QUEUE 

Subroutine: ADD2Q 

Meaning: A queue overflow occurred during an attempt to add a node to a queue that is 
already full. 
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Action- In the calling routine, check for an infinite loop if the queue length is large enough 
to hold the queue contents. If the queue really is not large enough to hold it contents, then 

increase the size of the queue array. 

**** POPQ: ERROR U602 - QUEUE UNDERFLOW **** 

File: QUEUE 

Subroutine: POPQ 

Meaning: A queue underflow occurred during an attempt to remove a node from a queue 
that is already empty. 

Action: In the calling routine, check for a programming error. 

**** WRTQ: ERROR U603 - EXCEEDED BOUNDS OF QUEUE ARRAY **** 

File : QUEUE 

Subroutine: WRTQ 

Meaning: While attempting to copy a queue node into the queue array, the bounds of the 
queue array were exceeded before the entire node was copied. This error indicates an internal 
programming error in the queue package. 

Action: Report error to the author of the queue package. 

**** PUSH: ERROR U701 - STACK OVERFLOW **** 

File: STACK 

Subroutine: PUSH 

Meaning: A stack overflow occurred during an attempt to add a node to a stack that is 
already full. 

Action: In the calling routine, check for an infinite loop if the stack length is large enough to 
hold the stack contents. If the stack really is not large enough to hold it contents, increase the 
size of the stack array. 

**** POP; ERROR U702 - STACK UNDERFLOW **** 

File : STACK 

Subroutine: POP 

Meaning: A stack underflow occurred during an attempt to remove a node from a stack that 
is already empty. 

Action: In the calling routine, check for a programming error. 

*** NXTWRD: ERROR U801 - ERROR PARSING NUMERIC VALUE (token) IN LINE: line *** 

File: UTIL 

Subroutine: NXTWRD 

Meaning: An inappropriate character was found in what is supposed to be a numeric token 
while parsing that token from an input line (e.g., if an alphabetic character appears in what is 
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supposed to be a real or integer number). This error indicates either an error in the data in the 
input line or a programming error in the calling routine. 

Action: Check the data on the input line passed to NXTWRD to make sure it is correct. 
Then check the calling routine to be sure it is looking for the correct type of token in the input 
line. 

**** NXTWRD: ERROR U802 - ILLEGAL FUNCTION — > f **** 

File: UTIL 

Subrout i ne : NXTWRD 

Meaning: An illegal value was passed to NXTWRD in the FUNCT argument. FUNCT 
indicates whether NXTWRD should look for a numeric value (FUNCT = ’N’) or a character 
value (FUNCT = ’C’) for the next token on the input line. Any other value for FUNCT is not 
supported. 

Action: In the calling routine, correct the FUNCT argument in the call to NXTWRD. 
FUNCT must be either ’N’ or ’C\ 

**** OPRNDS : ERROR U803 - OPERATOR op NOT FOUND IN WORD word **** 

File: UTIL 

Subroutine: OPRNDS 

Meaning: The character specified as the operator character in a binary operator expression 
was not found in the expression. 

Action: In the calling routine, check the value of the binary operator expression passed to 
OPRNDS. 

**** OPRNDS: ERROR U804 - ILLEGAL FORM FOR BINARY OPERATOR EXPRESSION **** 

File: UTIL 

Subroutine: OPRNDS 

Meaning: The binary operator expression passed to OPRNDS did not have the form: 
OPERAND 1 op OPERAND2 

Action: In the calling routine, check the value of the binary operator expression passed to 
OPRNDS. 

**** INTCHR: ERROR U805 - NUMBER OVERFLOWS CHAR STRING **** 

File: TFHUTL 

Subroutine: INTCHR 

Meaning: The character representation of the integer value passed to INTCHR is too long 
to fit in the output character variable provided to hold it. 

Action: In the calling routine, provide a longer character variable to receive the converted 
numeric value. 
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**** DBCHR : ERROR U806 - NUMBER OVERFLOWS CHAR STRING **** 

File: TFHUTL 

Subroutine: INTCHR 

Meaning: The character representation of the double value passed to DBCHR is too long to 
fit in the output character variable provided to hold it. 

Action: In the calling routine, provide a longer character variable to receive the converted 
numeric value. 

**** DBCHR: ERROR U807 - MAXLEN TOO SMALL FOR ROUTINE **** 

File: TFHUTL 

Subroutine: DBCHR 

Meaning: The length of the character array is too small for DBCHR. DBCHR concatenates 
two arrays, each of size 10 — one on each side of a decimal point. 

Action: Set value of MAXLEN in calling routine to 21. 

**** SKPICT : ERROR U900 - UNEXPECTED END-OF-FILE ENCOUNTERED WHILE READING 
DICTIONARY FILE 
File: DICUTL 

Subroutine: SKPICT 

Meaning: While the dictionary file was being read, an EOF was encountered before it should 
have been. The dictionary file is probably corrupted. 

Action: Rerun tdrive and recreate the dictionary file. 
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GLOSSARY 

Most terms unique to reliability modeling and fault-tolerant systems are defined within the 
body of each volume of this Technical Paper. The meaning of some terms are well known to 
researchers and users of these technologies but may not be familiar to new users of Hybrid 
Automated Reliability Predictor (HARP) integrated reliability (HiRel) tool system. Thus, the 
purpose of this glossary is to primarily aid new users. 


Availability 

Availability is a probabilistic quantity that predicts the operational life of a system that is 
subject to line maintenance (repair). Availability is the probability that a system under repair 
is operational at a specified time. In a Markov chain model representation, repair is modeled 
by adding transitions from states with n+ 1 failed components to states with n components. 
The transition rate is given as a repair rate. No fault tree model representation has yet been 
developed to represent an availability model; therefore, a Markov chain model must be given 
to HARP for solution. A fault tree model can be used to specify and generate a preliminary 
Markov chain model that the user needs to modify. 

Behavioral Decomposition 

Behavioral decomposition is a mathematical approximation technique that reduces a complex 
fault/error handling model (FEHM) to a branch point in a Markov chain. The effects of the 
FEHM are compensated for by modifying state transition rates. The advantage of this technique 
is that it greatly reduces the size of Markov models for solution and complex FEHM behavior 
that can be non-Markovian modeled. 

Bounds or Mathematical Bounds 

Large or complex mathematical models often require approximations to keep their solutions 
tractable. Bounds are the numerical expressions of the variation in a computed result due to 
mathematical approximation or uncertainty in the accuracy of the input data to the models. 

Combinatorial Model 

A .-ombinatorial model is a stochastic model that relates combinatorial component failure or 
success events to a subsystem or system failure or success, respectively. Combinatorial models 
do not distinguish the order of failure events. 

Coincident Fault 

A coincident fault exists at the same time one or more other faults are present. A coincident 
fault is not a simultaneous fault. 

Conservative Unreliability Result 

Mathematical quantities can be expressed in two forms, in exact form, which is usually a 
symbolic representation such as the symbol tt, or in an approximate form such as a decimal 
representation for tt as 3.14159. When approximations are necessary, the difference between the 
exact quantity (which may not be obtainable) and the computed result (which is obtainable) is 
called the error. A conservative unreliability result is one where the error in the computed result 
is in the direction of increased unreliability. 
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for fault-tolerant system models to account for fault/error handling times that may not be 
exponential. 

Sequence-Dependent Model 

A sequence-dependent model is a stochastic model that relates ordered component failure 
or success events to a subsystem or system failure or success, respectively. Sequence-dependent 
models distinguish the order of failure events. These models are more complex than combinato- 
rial models and are also more difficult to solve. 

Simultaneous Fault 

A simultaneous fault is second fault that occurs at exactly the same instant in time as a first 
fault. Markov chain models do not allow such faults. 

Weibull Distribution 

A Weibull distribution is a two parameter distribution that can exhibit time increasing, 
decreasing, or constant failure rates. 
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Critical-Pair Fault 

A critical fault is a near-coincident fault involving two faults. HARP uses three multifault 
models to account for critical-pair faults: ALL, SAME, and USER. 

Extended Behavioral Decomposition 

Extended behavioral decomposition is a generalized behavioral decomposition technique that 
allows multiple FEHM entry /exit transitions and multifault near-coincident modeling. 

Fault Tree 

A fault tree is a notational model that uses symbols resembling logic gates that relates failure 
events of components or subsystems to failure events of a system composed of components and 
subsystems. 

Instantaneous Jump Model 

An instantaneous jump model is a Markov model that is an approximation of a more complex 
semi-Markov model that produces a conservative result with respect to the semi-Markov model 
that is operated on mathematically to become the instantaneous jump model. 

Multifault Model 

A multi fault model is a fault/error handling model that accounts for two or more faults, none 
occurring simultaneously. 

Near-Coincident Fault 

A near-coincident fault is second fault that occurs during the time between the occurrence 
of a first fault and its recovery. 

Near-Coincident Failure 

A near-coincident failure is system failure resulting from a near-coincident fault. To reduce 
modeling complexity, a near-coincident failure is assumed to result from a near-coincident fault. 
Typically, this assumption results in a conservative result. 

Optimistic Unreliability Result 

An optimistic unreliability result occurs when the error in the computed result is in the 
direction of decreased unreliability. 

Primitive 

A primitive is any screen image that is an entity that can be manipulated without dissection, 
for example, a line, a circle, a fault tree gate, etc. 

Semi-Markov Models 

Semi-Markov models are generalizations of Markov models. In particular, semi-Markov 
models allow generalized state holding time distributions. Semi-Markov models are required 
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